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Abstract 

The bit-wise unequal error protection problem, for the case when the number of groups of bits £ is fixed, is 
considered for variable length block codes with feedback. An encoding scheme based on fixed length block codes 
with erasures is used to establish inner bounds to the achievable performance for finite expected decoding time. A 
new technique for bounding the performance of variable length block codes is used to establish outer bounds to the 
| performance for a given expected decoding time. The inner and the outer bounds match one another asymptotically 

and characterize the achievable region of rates-exponents vectors, completely. The single message message-wise 
\ unequal error protection problem for variable length block codes with feedback is also solved as a necessary step 

^| ■ on the way. 



X 

5-( 



Index Terms 

Unequal Error Protection(UEP), Feedback, Variable-Length Communication, Block Codes, Error Exponents, 
Burnashev's Exponent, Yamamoto-Itoh scheme, Kudryashov's signaling, Errors-and-Erasures Decoding Variable- 
Length Block Coding, Discrete Memoryless Channels (DMCs) 



I. Introduction 

^sO \ In the conventional formulation of digital communication problem, the primary concern is the correct transmission 
^ ■ of the message; hence there is no distinction between different error events. In other words, there is a tacit assumption 
£T) '. that all error events are equally undesirable; incorrectly decoding to a message m when a message m is transmitted, 
ON ' is as undesirable as incorrectly decoding to a message ffi when a message m is transmitted, for any m other than m 
■ and m other than m. Therefore the performance criteria used in the conventional formulation (minimum distance 
. between codewords, maximum conditional error probability among messages, average error probability, etc.) are 
| oblivious to any precedence order that might exist among the error events. 
t— ( ■ In many applications, however, there is a clear order of precedence among the error events. For example in 
^s! ! Internet communication, packet headers are more important than the actual payload data. Hence, a code used for 
Internet communication, can enhance the protection against the erroneous transmission of the packet headers at the 
expanse of the protection against the erroneous transmission of payload data. In order to appreciate such a coding 
Ctf \ scheme, one may analyze error probability of the packet headers and error probability of payload data separately, 
instead of analyzing the error probability of the overall message composed of packet header and payload data. Such 
a formulation for Internet communication is an unequal error protection (UEP) problem, because of the separate 
calculation of the error probabilities of the parts of the messages. 

Problems capturing the disparity of undesirability among various classes of error events, by assigning and 
analyzing distinct performance criteria for different classes of error events, are called unequal error protection (UEP) 
problems. UEP problems have already been studied widely by researchers in communication theory, coding theory, 
and computer networks from the perspectives of their respective fields. In this paper we enhance the information 
theoretic perspective on UEP problems (5], for variable length block codes by generalizing the results of |2l 
to the rates below capacity. 

In information theoretic UEP, error events are grouped into different classes and the probabilities associated with 
these different classes of error events are analyzed separately. In order to prioritize protection against one or the 
other class of error events, corresponding error exponent is increased at the expense of the other error exponents. 
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There are various ways to choose the error event classes but two specific choices of error event classes stand out 
because of their intuitive familiarity and practical relevance; they correspond to the message-wise UEP and the 
bit-wise UEP. Below, we first describe these two types of UEP then specify the UEP problems we are interested 
in this manuscript. 

In the message-wise UEP, the message set M is assumed to be the union of I disjoint sets for some fixed I, i.e., 
M = Uj =1 Aij where MiHMj = for all i ^ j. For each set Mj, the maximum error probability^ P e {j}, the rate 
R{j} and the error exponent E{j} are defined as the corresponding quantities defined in the conventional problem, 

i.e., P e {j} = m&x meMj P M / m M = m , R {j} = JbJ^M, E {j} = ~ ln ^ e{j} , for all j in {1,2,..., £} where n is 
the length of the code. The ultimate aim is calculating the achievable region of rate vector error exponent vector 
pairs, (R|.|, E|.})'s whera^l Rr.i = (R/n, Rr 2 \, . . . , R/n) and Er.i = (Ern , Er 2 |, . . . , E/n). The message-wise 
UEP problem was the first information theoretic UEP problem to be considered; it was considered by Csiszar in his 
work on joint source channel coding 0. Csiszar showed that for any integer I, block length n and ^-dimensional 
rate vector R/i such that < Rfj} < C for j = 1, 2, . . . , t, there exists a length n block code with message set 
M = VJ^ =l Mj where \Mj\ = e n ( R{j>_£n ) such that the conditional error probability of each message in each Mj 
is less then e~ n ^ Er ^ Rl]} ^ £ ^ where E r (-) is the random coding exponent and e n converges to zero as n divergesjj 

The bit-wise UEP problem is the other canonical form of the information theoretic UEP problems. In the bit-wise 
UEP problem the message set M is assumed to be the Cartesian product of M.\, M2, ■ ■ ■, Mi for some fixed 
I, i.e., M = M.\ x M.2 x . . . x M{. Thus the transmitted message M and the decoded message M are given 
by M = (Mi, M 2 , . . . , M^) and M = (Mi, M 2 , . . . , Me), receptively. Furthermore, M/s and M/s are called the 
transmitted and decoded sub-messages, respectively. The error events of interest in the bit-wise UEP problem are 
the ones corresponding to the erroneous transmission of the sub-messages. The error probability P e {j), rate R 3 and 
the error exponent Ej of sub-messages are given by P e (j) = P Mj 7^ Mj , Rj = ^IMA^ Ej = ~ ln ^ e(j ) for all j 
in {1, 2, ... , £} where n is the block length. As was the case in the message-wise UEP problem, the ultimate aim 
in the bit-wise UEP problem is determining the achievable region of the rate vector error exponent vector pairs^ 
(R, E). The formulation of Internet communication problem we have considered above, with packet header and 
payload data, is a bit-wise UEP problem with two sub-messages, i.e., with 1 = 2. 

There is some resemblance in the definitions of message-wise and bit-wise UEP problems, but they have very 
different behavior in many problems. For example, consider the message-wise UEP problem and the bit-wise 
UEP problem with 1 = 2, M\ = {1,2} and M2 = {3, 4, . . . , e n( - c ~ £ ^} for some e n that goes to zero as n 



M 2 M 2 



< e n for some e n that goes to 



diverges. It is shown in El Theorem 1] that if M = M\ x M2 and P 

zero as n diverges therd Ei = 0. Thus in the bit-wise UEP problem even a bit can not have a positive error exponent. 
As result of (5J Theorem 5], on the other hand, if = M\ U M2 we know that M\ can have an error exponent 



E(D as high as E r (0) > while having a small error probability for M2, i-e-, max P M / m 

1 1 meM 2 



M= m 



< e n for 



e to give an error 



some e n that goes to zero as n diverges. Thus in the message-wise UEP problem it is possible 
exponent as high as E r (0) to .Mi. 

The message-wise and the bit-wise UEP problems cover a wide range of problems of practical interest. Yet, 
as noted in O, there are many UEP problems of practical importance that are neither message-wise nor bit-wise 
UEP problems. One of our aims in studying the message-wise and the bit-wise UEP problems is gaining insights 
and devising tools for the analysis of those more complicated problems. 

This formulation is called the missed detection formulation of the message-wise UEP problem in 0. If P [m / m | M = mj is replaced 

with P = m | M 7^ mj we get the false alarm formulation of the message-wise UEP problem. In this paper we restrict our discussion 
to the missed detection problem and use message-wise UEP without any qualifications to refer to the missed detection formulation of the 
message-wise UEP problem. 

2 Here I is assumed to be a fixed integer. All rates-exponents vectors, achievable or not, are in the region of R 2£ in which Ft{j} >0 and 
E{j} >0 for all 1 < j ' < £, R 2e is the 21 dimensional real vector space with the norm ||X|| = sup^- |asj| 

3 Csiszar proved the above result not only for the case when £ is constant for all n but also for the case when £ n is a sequence such that 
limn^oo = 0. See (3 Theorem 5]. 

4 Similar to the message-wise UEP problem discussed above, in the current formulation of bit-wise UEP problem we assume £ to be fixed. 
Thus all rates-exponents vectors, achievable or not, are in region of R 2 * in which Rj > and Ej > for all 1 < j * < £, by definition. 

5 The channel is assumed to have no zero probability transition. 
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In the above discussion the UEP problems are described for fixed length block codes for the sake of simplicity. 
One can, however, easily define the corresponding problems for various families of codes:with or without feedback, 
fixed or variable length, by modifying the definitions of the error probability, the rate and the error exponent 
appropriately. Furthermore parameter £ representing the number of groups of bits or messages is assumed to be 
fixed in the above discussion for simplicity. However, both the message-wise and the bit-wise UEP problems can be 
defined for ts that are increasing with block length n in fixed length block codes and for £'s that are increasing with 
expected block length E[T] in variable length block codes. In fact Csiszar's result discussed above, (5j Theorem 
5], is proved not only for constant £ but also for any £ n sequence satisfying lim n ^oo = 0. 

In this manuscript we consider two closely related UEP problems for variable length block codes over a discrete 
memoryless channels with noiseless feedback: the bit-wise UEP problem and the single message message-wise 
UEP problem. 

• In the bit-wise UEP problem there are £ sub-messages each with different priority and rate. For all fixed values 
of £ we characterize the trade-off between the rates and the error exponents of these sub-messages by revealing 
the region of achievable rate vector, exponent vector pairs. For fixed £ this problem is simply the variable 
length code version of the above described bit-wise UEP problem. 

• In the single message message-wise UEP problem, we characterize the trade-off between the exponents of the 
minimum and the average conditional error probability. Thus this problem is similar to the above described 
message-wise UEP problem for the case £ = 2 and A4\ = {1}. But unlike that problem we work with variable 
length codes and average conditional error probability rather than fixed length codes and the maximum error 
probability. 

The bit-wise UEP problem for fixed number of groups of bits, i.e., fixed £, and the single message message-wise 
UEP problem were first considered in HI, for the case when the rate is (very close to) the channel capacity; we 
solve both of these problems for all achievable rates. 

In fact, in O single message message-wise UEP problem is solved not only at capacity, but also for all the 
rates below capacity both for fixed length block codes without feedback and for variable length block codes with 
feedback, but only for case when overall error exponent is zero (see (2j Appendix D]). Recently Wang, Chandar, 
Chung and Wornell ifTTTl put forward a new proof based on method of types for the same problem^ Nazer, Shkel and 
Draper (9), on the other hand, investigated the problem for fixed length block codes on additive white Gaussian noise 
channels at zero error exponent and derived the exact analytical expression in terms of rate and power constraints. 

Before starting our presentation, let us give a brief outline of the paper. In Section UH, we specify the channel 
model and make a brief overview of stopping times and variable length block codes. In Section [Till we first present 
the single message message-wise UEP problem and fixed £ version of the bit-wise UEP problem for variable length 
block codes; then we state the solutions of these two UEP problems. In Section [IV] we present inner bounds for 
both the single message message-wise UEP problem and the bit-wise UEP problem. In Section [V] we introduce 
a new technique, Lemma [51 for deriving outer bounds for variable length block codes and apply it to the two 
UEP problems we are interested in. Finally in Section [VI] we discuss the qualitative ramifications of our results 
in terms the design of communication systems with UEP and the limitations of our analysis. The proofs of the 
propositions in Sections [Till [TV] [V] are deferred to the Appendices. 



As it is customary we use upper case letters, e.g., M, X, Y, T for random variables and lower case letters, e.g., 
m, x, y, t for their sample values. 

We denote discrete sets by capital letters with calligraphic fonts, e.g., M., X, y and power sets of discrete sets 
by p(-), e.g., p{M), p{X), p(y). In order to denote the set of all probability distributions on a discrete set we 
use &>(■), e.g., &>(M), &{X), @>{y). 

Definition 1 (Total Variation): For any discrete set Z and for any /ii,//2 £ &*{Z) the total variation A(/ii,//2) 
is defined as, 



II. Preliminaries 




(1) 



6 In addition to their new proof in missed-detection problem II II Theorem 1] Wang, Chandar, Chung and Wornell present a completely 
new result on the false-alarm formulation of the problem 1111 Theorem 5]. 
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We denote the indicator function by l/.i, i.e., = 1 when event T happens l{ r } = otherwise. 
We denote the binary entropy function by h(-), i.e., 

h(s) = - sins - (1- s)ln{l- s) VsG [0,1]. (2) 

A. Channel Model 

We consider a discrete memoryless channel (DMC) with input alphabet X, output alphabet y and | X\ — by — \y\ 
transition probability matrix W. Each row of W corresponds to a probability distribution on y, i.e., W x G &*(y) 
for all x G X. For the reasons that will become clear shortly, in Section Hl-D[ we assume that W x {y) > for all 
x € X and y G y and denote the smallest transitions probability by A: 

A= min WJy) > 0. (3) 

x,y 

The input and output letters at time r, up to time r and between time T\ and T2 are denoted by X r , Y T , X T , Y T , 
X£ 2 and Y£ 2 respectively. DMCs are both memoryless and stationary, hence the conditional probability of Y r = y 
given (X T , Y T_1 ) is given by 

P[Y r = y \X T ,Y T - 1 ] = WxM- 

Definition 2 (Empirical Distribution): For any T2 > T\ and any sequence z^! 2 such that Zj G Z for all j G [t\, t?\, 
the empirical distribution Q{ z ; 2 } is given by 

Q {z ; ?} (z) = 1 -—rT, T2 1 {^-> VZGZ - W 

111 T2 — T\ + 1 ^T=n 1 1 

Note that if we replace z^ 2 by TJ T \ when the empirical distribution Q{ Z t 2 }(z) becomes a random variable for each 
zE2. 

B. Stopping Times 

Stopping times are central in the formal treatment of variable length codes; it is not possible to define or 
comprehend variable length codes without a solid understanding of stopping times. For those readers who are not 
already familiar with the concept of the stopping times, we present a brief overview in this section. 

In order to make our presentation more accessible, we use the concept of power sets, rather than sigma- fields 
in the definitions. We can do that only because the random variables we use to define stopping times are discrete 
random variables. In the general case, when the underlying variables are not necessarily discrete, one needs to use 
the concept of sigma fields instead of power set. 

Let us start with introducing the concept of Markov times. For an infinite sequence of random variables Zi, Z2, . . ., 
a positive, integer* valued^ function T defined on Z°° is a Markov time, if for all positive integers r it is possible 
determine whether T = r or not by considering Z r only, i.e., if lr T=T i is not only a function of Z°° but also a 
function of Z T for all positive integers r. The formal definition is given below. 

Definition 3 (Markov Time): Let Z^° be an infinite sequence of Z valued random variables Z T for r G {1, 2, . . .} 
and T be a function of Z°° which takes values from the set {1, 2, . . . , 00}. Then the random variable T is a Markov 
time with respect to Z T if 

{ z °° : T = r if Z°° = z°°} G p(Z T ) x {Z? +x } Vr G {1, 2, . . .}. (5) 

where p(Z T ) x {Z^ +1 } is the Cartesian product of the power set of Z T and the one element set {Z^^}. 

We denote Z T 's from r = 1 to r = T by Z T and their sample values by z*. The set of all sample values of Z T 

such that T = r, on the other hand, is denoted by -ZT T=r ,. We denote union of all -2T T=T |'s for finite r's by Z T * 



1 Integer* is the set of all integers together with two infinities, i.e., {— 00, . . . , — 1, 0, 1, ... , 00}. 



5 



and the union of all 2T T=T \'s by Z T , i.e., 

Z T {T=T} ={z T : T = r if Z T = z T } r € {1, 2, . . . ,00} (6a) 



1<T<00 



2 T =2 T *\j2f r=Qo} . (6c) 

For an arbitrary, positive, integer* valued function T of Z°°, however, one can not talk about Z T , because the value 
of T can in principle depend on Z^? +1 . For a Markov time T, however, the value of T does not depend on Z^? +1 . 
That is why we can define Z T , ZZ r=T x, Z T * and Z T for any Markov time T. 

Given an infinite sequence of z T 's, i.e., z°°, either z°° G Z^- =oqX or z °° ^ as a uruc l ue subsequence z T that is in 
Z T *. 

In most practical situations, one is interested in Markov times that are guaranteed to have a finite value; those 
Markov times are called Stopping times. 

Definition 4 (Stopping Time): A Markov time T with respect to U is a Stopping Time iff P[T < 00]. 

Note that if T is a stopping time then P[Z T £ Z T *~\ = 1. Furthermore unlike Z T , Z T * is a countable set for all 
stopping times T because \Z\ is finite^ 



C. Variable Length Block Codes 

A variable length block code on a DMC is given by a random decoding time T, an encoding scheme <3? and a 
decoding rule ^ satisfying P[T < 00] = 1. 

• Decoding time T is a Markov time with respect to the receiver's observation Y T , i.e., given Y T receiver knows 
whether T = r or not. Hence T is a random quantity rather than a constant, thus neither the decoder nor the 
receiver knows the value of T a priori. But as time passes, both the decoder and the encoder (because of 
feedback link) will be able to decide whether T has been reached or not, just by considering the current and 
past channel outputs. 

• Encoding scheme $ is a collection of mappings which determines the input letter at time (r + 1) for each 
message in the finite message set M., for each y T € y T such that T > r, 

$(-,y T ) : M X Vy T :T>r. 

• Decoding Rule is a mapping from the set of output sequences y T such that T = r to the finite message set M. 
which determines the decoded message, M. With a slight abuse of notation we denote the set of all, possibly 
infinite, output sequences y T such that {T = r if Y r = y r } b)jf| y T and write the decoding rule \& as, 

: y T -> M. 

• Note that because of the condition P[T < 00] = 1, decoding time is not only a Markov time, but also a 
Stopping timef*"! 

At time zero the message M chosen uniformly at random from M. is given to the transmitter; the transmitter uses 
the codeword associated M, i.e., $(M,-), to convey the message M until the decoding time T. Then the receiver 
chooses the decoded message M using its observation Y T and the decoding rule \E', i.e., M = ^(Y T ). The error 
probability, the rate and the error exponent of a variable length block code are given by 



lnlA^I -lnP e 

R = W E = -Eirr <7) 



Indeed one can interpret the variable length block codes on DMCs as trees, for a more detailed discussion of this 
interpretation readers may go over (U Section II]. 

t Z x * is a countable set even when \Z\ is countably infinite. 
9 See equation ([5J. 

10 Having a finite decoding time with probability one, i.e., P[T < 00] = 1, does not imply having a finite expected value for the decoding 
time, i.e., E[T] < 00. Thus a variable length code can, in principle, have an infinite expected decoding time. 
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D. Reliable Sequences for Variable Length Block Codes 

In order to suppress the secondary terms while discussing the main results, we use the concept of reliable 
sequences. In a sequence of codes we denote the error probability and the message set of the n th code of the 
sequence by Pj- 1 ^ and M.( K \ respectively. 

Definition 5 (Reliable Sequence): A sequence of variable length block codes Q is reliable if the error probabilities 
of the codes vanish and the size of the message sets of the codes diverge 

& ( p * iK) + w) = °- 

where P e ^ and M.^ are the error probability and the message set for the n th code of the reliable sequence, 
respectively. 

Note that in a sequence of codes, each code has an associated probability space. We denote the random variables 
in these probability spaces together with a superscript corresponding to the code. For example the decoding time of 
the K th code in the sequence is denoted by j( K \ The expected value of random variables in the probability space 
associated with the K th code in the sequence is denoted^ by E( K ) [•]. 

Definition 6 (Rate of a Reliable Sequence): The rate of a reliable sequence Q is the limit infimum of the rates 
of the individual codes, 

^W ewitW] 1 

Definition 7 (Capacity): The capacity of a channel for variable length block codes is the supremum of the rates 
of the all reliable sequences. 

C=supRQ. 

Q 

The capacity of a DMC for variable length block codes is identical to the usual channel capacity, 0. Hence, 

C= max V ^( X )^ x (y)ln-^M (8) 
ne9>(X) *-^*,y /i(y) 

where £(y) = £ X M X ) W x (y). 

Definition 8 (Error Exponent of a Reliable Sequence): The error exponent of a reliable sequence Q is the limit 
infimum of the error exponents of the individual codes, 

— InPpW 

Eo=lim inf 



K->-OC E( K ) [T( K )] ' 

Definition 9 (Reliability Function): The reliability function of a channel for variable length block codes at rate 
R € [0, C] is the supremum of the exponents of all reliable sequences whose rate is R or higher. 

£(R)= sup E Q . 

Q:Rq>R 

Burnashev Q analyzed the performance of variable length block codes with feedback and established inner and 
outer bounds to their performance. Results of (3j determine the reliability function of variable length block codes 
on DMCs for all rates. According to Q: 

"Recall that the decoding time of a variable length block code is finite with probability one. Thus P' re ' ^T' re ' < ooj = 1 for all k for a 
reliable sequence. 

12 Evidently it is possible to come up with a probability space that includes all of the codes in a reliable sequence and invoke independence 
between random quantities associated with different codes. We choose the current convention to emphasize independence explicitly in the 
notation we use. 
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• If all entries of W are positive then 



E(R) =(l-^jD VR G [0, C] 

where D is maximum Kullback Leibler divergence between the output distributions of any two input letters: 

D=maxB(W x \\ W x ) . (9) 

• If there are one or more zero entries^ in W, i.e., if there are two input letters x, x and an output letter y such 
that, W x (y) = and W x (y) > 0, then for all R < C, for large enough E[T] there are rate R variable length 
block codes which are error free, i.e., P e = 0. 
When P e = all error events can have zero probability at the same time. Consequently all the UEP problems are 
answered trivially when there is a zero probability transition. This is why we have assumed that W x (y) > for all 
x G X and y G y. 

We denote the input letters that get this maximum value of Kullback Leibler divergence bjo a and r: 

D = B(W a \\W T ). (10) 

III. Problem Statement and Main Results 

A. Problem Statement 

For each m G M, the conditional error probability is defined as0 



P i =P 

1 e|m 1 



M / M 



M = m 



(ID 



In the conventional setting we are interested in either the average or the maximum of the conditional error probability 
of the messages. The behavior of the minimum conditional error probability is scarcely investigated. Single message 
message-wise UEP problem attempts to answer that question by determining the trade-off between exponential decay 
rates of P e and min mg ^j P e \ m - The operational definition of the problem in terms of reliable sequences is as follows. 

Definition 10 (Single Message Message-wise UEP Problem): For any reliable sequence Q the missed detection 
exponent of the reliable sequence Q is denned as 

-lnmin m6MM 

E md , Q = liminf E(<0 (12) 

where is the conditional error probability of the message m for the K th code of the reliable sequence Q. 

For any rate R G [0, C] and error exponent E G [0, (1 — me missed detection exponent E m( j(R, E) is 

defined as, 

E md (R,E)= sup E mdQ . (13) 

.Rq>R 
^•E«j>E 

13 Problem is formulated somewhat differently in ]3j, as a result (3) did not deal with the case E[T] = oo. The bounds in (3j does 
not guarantee that the error probability of a variable length code with infinite expected decoding time is greater than zero, however this 
is the case if all the transition probabilities are positive. To see that consider a channel with positive minimum transition probability A, 

i.e., A = min x , y W x (y) > 0. In such a channel any variable length code satisfies P e > 'j^T 1 E ( t^t ) > then P e > as A > and 



P[T < oo] = 1. Consequently both the rate and the error exponent are zero for variable length block codes with infinite expected decoding 
time. A more detailed discussion of this fact can be found in Appendix IH1I 
14 Note that in this situation D = oo. 

15 This particular naming of letters is reminiscent of the use of these letters in Yamamoto Itoh scheme jT2). Although they are named 
differently in 1121 . a is used for accepting and r is used for rejecting the tentative decision in Yamamoto Itoh scheme. 

16 Later in the paper we consider block codes with erasures. The conditional error probabilities, P e | m for m 6 M, are defined slightly 
differently for them, see equation d24h . 

17 Burnashev's expression for error exponent of variable length block codes is used explicitly in the definition because we know, as a result 
of (3), that the error exponents of all reliable sequences are upper bounded by Burnashev's exponent. An alternative definition oblivious to 
Burnashev's result can simply define E m[ |(R, E) for all rates-exponents vectors that are achievable. That definition is equivalent to Definition 
1101 because of (3). 
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In variable length block codes with feedback, the single message message wise UEP problem not only answers 
a curious question about the decay rate of the minimum conditional error probability of a code, but also plays a 
key role in the bit-wise UEP problem, which is our main focus in this manuscript. 

Though they are central in the message-wise UEP problems, the conditional error probabilities of the messages 
are not relevant in the bit-wise UEP problems. In the bit-wise UEP problems we analyze the error probabilities of 
groups of sub-messages. In order to do that, consider a code with a message set M of the form 

M = Mi x M2 x ... x Me 

Then the transmitted message M and decoded message M of the code are of the form 

M = (Mi,M 2 ,...,M £ ) 
M = (M u M 2 ,...,M e ) 

where Mj, Mj € Mj for all j = 1,2, . . . ,£. Furthermore Mj and Mj are called j th transmitted sub-message and 
j th decoded sub-message, respectively. 

The error probabilities we are interested in correspond to erroneous transmission of certain parts of the message. 
In order to define them succinctly let us define M? , IW and IVP for all j between one and i as follows: 

M j =Mi xM 2 x...xMj 

M^(M 1 ,M 2 ,...,M,) 
M- 7 =(M 1) M 2 , . . . , Mj). 

Then P e (j) is defined^ as the probability of the event that M j ^ M j 



P e (j)=P Mi ± W 



forj = 1,2, 



Note that if M j / M-? then IW ^ M i for all i greater than j. Thus 

Pe(l) < P e (2) < P e (3) < ... < P B {1). 



(14) 



(15) 



Definition 11 (Bit-wise UEP Problem For Fixed I): For any positive integer £ let Q be a reliable sequence whose 



message sets M^ are of the form M^ = M{ K x M 2 
and the error exponent vector Eq are defined as 



ln\M) 
: lini inf — 

K-S>00 E( K ) [T( K )] 

. f -lnP e (j)W 
: lim ml — , , r , -1 
k^-oo E( K ) [T( K )1 



(k) 

x M\ . Then the entries of the rate vector Rj; 

Vj e{l,2,...,£} 
Vj€{l,2,...,i}. 



A rates-exponents vector (R, E) is achievable if and only if there exists a reliable sequence Q such that (R, E) = 

(Rq,Eq). 

This definition of the bit-wise UEP problem is slightly different than the one described in the introduction, 



because P e (j) is defined as P M J / IW rather than P 



Note that if M / M, then M-? / M j ; 



consequently P 



Mi / M^' 



> P 



M,- ^ M, 



for all fs. In addition, if we assume without loss of generality that 



Mj^M 3 



> P 



Mi £ Mi 



< JP 



Mj^Mj 



Thus 



for all j > i, the union bound implies that P 
for the case when £ is fixed, both formulations of the problem result in exactly the same achievable region of 
rates-exponents vectors. 

The achievable region of rates-exponents vectors could have been defined as the closure of the points of the 
form (Rq,Eq) for some reliable sequence Q. Using the definition of (Rq,Eq)'s one can easily show that, in this 
case too both definitions result in exactly the same achievable region of rates-exponents vectors. 



18 Similar to the conditional error probabilities, P e |m' s for m £ M, error probabilities of sub-messages, P e (j)'s for j = 1,2, 
defined slightly differently for codes with erasures, see equation POl , 



, £, are 
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B. Main Results 

For variable length block codes with feedback, the results of both the single message message-wise UEP problem 
and the bit-wise UEP problem are given in terms of the J(R) function defined below. The J(R) function is first 
introduced b}0 Kudryashov Q equation (2.6)] while describing the performance of non-block variable length codes 
with feedback and delay constraints. Later the J(R) function is used in Q for describing the performance of block 
codes in single message message-wise UEP problem. It is shown in JH Appendix D] that for both fixed length 
block codes without feedback and variable length block codes with feedback on DMCs satisfy, 

E md (R,0) = J(R). (16) 

Recently Nazer, Shkel and Draper obtained the closed form expression for E m( j(R, 0) for fixed length block codes 
on the Additive White Gaussian Noise channel, under certain average and peak power constraints (9l Theorem 1]. 
Curiously equality given in ( fT6b holds in that case tooj^] 
Definition 12: For any R £ [— oo, C], J(R) is defined as 

J(R)= max aD W Xl ) + (1 - a)T> (fi 2 \\ W X2 ) (17) 

0<a<l 
Xi,x 2 SA' 

a,xi,x 2 ,/ii,/i 2 : n u ii 3 e&i(X) 

where /^(y) = ^ x W x (y)m{x) for i = 1, 2. 

We have plotted the J(R) function for Binary Symmetric Channelo (BSCs) with various cross over probabilities 
in Figure Q] Note that as the channel becomes noisier, i.e., as the crossover probability becomes closer to 1/2, the 
value of J(R) function decreases at all values of rate where it is positive. Furthermore the highest value of rate 
where it is positive, i.e., the channel capacity, decreases. 

I9 In UJ equation (2.6)] there is no optimization over the parameter a. Thus strictly speaking, what is introduced in J7J equation (2.6)] is 
j (R) given in equation i64\ rather than J(R) given in l !17b . 

20 Unlike DMC for these channels it is possible to obtain a closed form expression in terms of the rate and the power constraints. 
2I Recall that in a binary symmetric channel with crossover probability probability p, X = {0, 1}, y = {0, 1} and W x (y) = (1 — 

P) 1 {x=y} +pl{ x ^y}- 
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Fig. 2. E m d(R, E) is drawn at various values of the error exponent E as a function of rate R for a BSC with crossover probability p = 0.01. 
Note that when p = 0.01, C = 0.6371 Nats per channel use and D = 4.503. As we increase the exponent of the average error probability, 
i.e., E, the value of E m< j(R, E) decreases, as one would expect. 



Lemma 1: The function J(R) denned in equation (fTTT ) is a concave, decreasing function such that J(R) = D 
for R < 0. 

Proof of Lemma Q] is given in Appendix |A) 

Now let us consider the singe message message-wise UEP problem given in Definition [TOj 
Theorem 1: For any rate < R < C and error exponent E < (1 — ^)D the missed detection exponent E m( j(R, E) 
defined in equation (fT3l ) is equal tco 

E md (R,E) = E+(l-§) J(j=e) (18) 

where C, D and J(-) are given in equations ©, (O and (fTD ). respectively. Furthermore E m d(R, E) is jointly concave 
in (R, E) pairs. 

We have plotted E mc j(R, E) as a function of rate, for various values of E in Figure |2] When rate is zero, the 
exponent of the average error probability can be made as high as D. Thus all the curves meet at (0, D) point. But 
for all positive rates the exponent of the average error probability makes a difference; as E increases E m d(R, E) 
decreases. Furthermore for any given rate R the exponent of the average error probability can only be as high as 
(1 — jj)D. This is why the curves corresponding to higher values of E have smaller support on rate axis. 
Proof of Theorem Q] is presented in Appendix U 

Similar to the single message message-wise UEP problem, the solution of the bit-wise UEP problem is given in 
terms of the J(R) function. 

22 For the case when R = and E = D the (l - §) term should be interpreted as 0, i.e., (l - I) j(-^-k) I = 0. 
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R 2 )=(C/3,C/4) 



,R 2 )=(C/3,C/2) 



,R 2 )=(2C/3,C/4) 



3.5 



Fig. 3. Ei (Ri, R2, E2) is drawn for various values rate pairs (Ri, R2) as a function error exponent E2 for a BSC with crossover probability 
p = 0.01. Recall that when p = 0.01, C = 0.6371 Nats per channel use and D = 4.503. 



Theorem 2: A rates-exponents vector (R, E) is achievable if and only if there exists a fj such thato 

^ < ( 1 -^ =1 r ?j) jD + ^ Vie{i,2,...,r 



j=i+i 



R* < C m 
Vi >o 



Vie {1,2,..., £} 
Vie {1,2,. ..,£} 



(19a) 

(19b) 
(19c) 

(19d) 



where C, D and J(-) are given in equations ©, ® and (TTTT ). respectively. Furthermore the set of all achievable 

rates-exponents vectors is convex. 

Proof of Theorem |2] is presented in Appendix [J] 

For the special case when there are only two sub-messages the condition given in Theorem [2] for the achievablity 
of a rate vector error exponent vector pair can be turned into an analytical expression for the optimal Ei in terms 
of Ri, R2 and E2. In order to see why, note that revealing the region of achievable (Ri, R2, Ei, E2) vectors is 
equivalent to revealing the region of achievable (Ri, R 2 , E2)'s and the value of the maximum achievable Ei for all 
the (Ri,R2,E2)'s in the achievable region. 

Corollary 1: For any rate pair (Ri,R2) such that R x + R 2 < C and error exponent E2 such that E2 < (1 — 



Rl + R2 )D, the optimal value of Ei is given b>|^| 



Ei (Ri, R2, E2) — E2 + ( 1 



Ri E2 



C 



D 



J 



CD 



(20) 



where C, D and J(-) are given in equations (HJ, © and (fTTT ). respectively. Furthermore Ei(Ri, R2, E2) is concave 
in (Ri.Ra.Ea). 



3 For the case when Rj = and r\j — the rjjj( — j term should be interpreted as 0, i.e., rjj Jf R 



Rj=0 
Vj=0 



0. 



4 For the case when R2 = and E2 = (1 — ^-)D, the second term on the right hand side of equation i20l should be interpreted as zero, 



R2 



= 



R 2 =0, E a =(l-- 7 f)B 
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Note that for the £ , 1 (R 1 ,R 2 ,E 2 ) given in equation ([20]>, #i(R 1 ,R 2 ,E 2 ) > E 2 for all (R 1 ,R 2 ,E 2 ) triples such 
that Ri + R 2 < C and E 2 < (1 — Rl + R2 )£>. Furthermore inequality is strict as long as R 2 > 0. We have drawn 
2?i(Ri, R 2 , E 2 ) for various (Ri,R 2 ) pairs as a function of E 2 in Figure [3] 

IV. ACHIEVABLITY 

In both the single message message-wise UEP problem and the bit-wise UEP problem, the codes that achieve 
the optimal performance employ a number of different ideas at the same time. In order to avoid introducing all of 
those ideas at once, we first describe two families of codes and analyze the probabilities of various error events 
in those two families of codes. Later we use those two families of codes as the building blocks for the codes that 
achieve the optimal performance in the UEP problems we are interested in. Before going into a more detailed 
description and analysis of those codes let us first give a birds eye plan for this section. 

(a) A Single Message Message-wise UEP Scheme without Feedback: First in Section |IV-A[ we consider a family of 
fixed length codes without feedback. We prove that these codes can achieve any rate R less than channel capacity, 
with vanishing^] error probability P e while having a minimum conditional error probability, min m P e i m , as low 
as e - nJ ( R ), The main drawback of this family of codes is that the decay rate of the average error probability 
P e has to be subexponential in this family of codes. 

(b) Control Phase and Error-Erasure Decoding: In Section ITV-BI in order to obtain non-zero exponential decay for 
the average error probability, we use a method introduced by Yamamoto and Itoh in |[T2l . We append the fixed 
length codes described in Section IIV-AI with a control phase and use an error-erasure decoder. This new family 
of codes with control phase and error-erasure decoding are shown, in Section IIV-BI to achieve any rate R less 
than the channel capacity C with exponentially decaying average error probability P e , exponentially decaying 
minimum conditional error probability min m P e \ m and vanishing erasure probability, P x . 

(c) Single Message Message-wise UEP for Variable Length Codes: In Section IIV-CI we obtain variable length 
codes for single message message-wise UEP problem using the codes described in Section IIV-BI In order to 
do that we use the fixed length codes with feedback and erasures described in Section IIV-BI repetitively until 
a non-erasure decoding happens. This idea too, was employed by Yamamoto and Itoh in |[T2ll . 

(d) Bit-wise UEP for Variable Length Codes: In Section ITV-DI we first use the codes described in Section llV-AI and 
the control phase discussed in Section ITV-B I to obtain a family of fixed length codes with feedback and erasures 
which has bit-wise UEP, i.e., which has different bounds on error probabilities for different sub-messages. 
While using the codes described in Section IIV-AI we employ an implicit acceptance explicit rejection scheme 
first introduced in Q by Kudrayshov. Once we obtain a fixed length code with erasures and bit-wise UEP, we 
use a repeat at erasures scheme like the one described in Section IIV-CI to obtain a variable length code with 
bit-wise UEP. 

The achievablity results we derive in this section are revealed to be the optimal ones, in terms of the decay rates 
of error probabilities with expected decoding time E[T], as a result of the outer bounds we derive in Section IVl 

A. A Single Message Message-wise UEP Scheme without Feedback 

In this subsection we describe a family of fixed length block codes without feedback that achieves any rate R 
less then capacity with small error probability while having an exponentially small min m P e |m> f° r sufficiently large 
block length n. We describe these codes in terms of a time sharing constant a G [0,1], two input letters xi,x 2 G X 
and two probability distributions on the input alphabet, /xi,/x 2 G &(X). 

In order to point out that certain sequence of input letters is a codeword or part of a codeword for message m, 
we put (m) after it. Hence we denote the codeword for m by x n (m) in a given code and by X n (m) in a code 
ensemble, as a random quantity. 

Let us start with describing the encoding scheme. The codeword of the first message, i.e., x n (l), is xi in first n a = 
[an] time instances and x 2 in the rest, i.e., x r (l) = Xi for r = 1, . . . , n Q and x T (l) = x 2 for r = (n a + 1), . . . , n. 
The codewords of the other messages are described via a random coding argument. In the ensemble of codes we are 
considering all entries of all codewords other than the first codeword, i.e., X T (m) Vr G [1, n], Vm / 1, are generated 
independently of other codewords and other entries of the same codeword. In the first n a time instances X r (m) is 



Vanishing with increasing block length. 
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generated using \x\, in the rest using /i 2 , i- e -> P[X r (m) = x] = ^i(x) for r = 1, . . . , n a and P[X T (m) = x] = // 2 (x) 
for r = (n Q + 1), . . . ,n. 

Let us begin the description of the decoding scheme, by specifying the decoding region of the first message 
Q[l]: it is the set of all output sequences y n whose the empirical distribution is not typical with (a, fa, fa). More 
precisely, the decoding region of the first message, Q\\\, is given by, 

g[l] = {y 11 :n a A(Q {y? « }) /i 1 ) + (n-n Q )A(Q {y = Q+i} ,/i 2 ) > \X\\y\ y/^Ul+^j} (21) 

where A is the total variation distance defined in equation (Q]), Q{ y ^°} and Q| y n j are the empirical distributions 
of y™" and y£ +1 defined in equation ((U) and fa and fa are probability distributions on y, i.e., fa, fa £ ^{y), 
such that fa(y) = ^ x MiW W x (y). 

For other messages, m ^ 1, decoding regions Q[m] are the set of all output sequences for which Q{ x »(m),y n } i s 
typical with (a, fa W,fa W) and Q{ x n(m) y n| is not typical with (a, faW , faW) for any fn 7^ m. To be precise 
the decoding region of the messages other than the first message are 



g[m] = £[x n (m)] f| (n^ m £[x-(m)]J Vm € {2, 3, ... , \M\} (22) 
where for all x n G X n , B[x n ] is the set of all y n 's for which (x n ,y n ) is typical with (a, fa W, 112 W): 



B[x n ] = {y n :n a A(Q K « iy? <. } , A ti^) + (n-n Q )A(Q {x o Q+iiy = Q+i})A t 2 Wj < \X\\y\ Vnln(l + n)} (23) 

where A is the total variation distance defined in equation £[]), Q{j^ a ,yj<»} an d Q{ x » y = +1 } are the empirical dis- 
tributions of (x" a , y^ a ) and (x° +1 ,y° +1 ) defined in equation (0]) and \i\ W and fi 2 W are probability distributions 
on X x y, i.e., m W £ 3?{X x y) and /x 2 W G &{X x 3?). 

In Appendix |B] we have analyzed the conditional error probabilities, P e | m for the above described code and 
proved Lemma |2] given below. 

Lemma 2: For any block length n, time sharing constant a G [0,1], input letters xi, x 2 G X and input distributions 
fai\ii G &{X) there exists a length n block code such that 



p K -n(«D(p 1 ||W. 1 )+(l-a)D(A a ||W^)-e a ) 
e| 1 — 

P e |m<£n m = 2, 3, . . . , \M\ 



where Ai(y) = £ x W x (y)fa(x), fa(y) = £ x W x (y)^ 2 (x) and e n = 

Given the channel VF, if we discard the error terms e n , for a given value of rate, < R < C, we can can 
optimize exponent of P e |i over the time sharing constant a, the input letters xi,x 2 and input distributions fa, fa. 
Evidently the optimization problem we get is the one given for the definition of J(R), in equation (fTTl) . Thus 
Lemma [2] implies that for any R G [0, C] and block length n there exists a length n code such that > e n ( R - £ n) ) 
P e |m < e n for m = 2,3,..., |M| and P e]1 < e ^(R)^). 

One curious question is whether or not the exponent of P e \ x can be increased by including more than two phases. 
Caratheodory's Theorem answers that question negatively, i.e., to obtain the largest value of J(R) one doesn't need 
to do time sharing between more than two input-letter-input-distribution pairs. 



B. Control Phase and Error-Erasure Decoding: 

The family of codes described in Lemma [2] has a large exponent for the conditional error probability of the first 
message, i.e., P e |i. But the conditional error probabilities of other messages, P e | m for decay subexponentially. 

In order to facilitate an exponential decay of P e | m for m ^ 1 with block length, we append the codes described in 
Lemma [2] with a control phase and allow erasures. The idea of using a control phase and an error-erasure decoding, 
in establishing achievablity results for variable length code, was first employed by Yamamoto and Itoh in lfT2l . 

In order explain what we mean by the control phase, let us describe our encoding scheme and decoding rule 
briefly. First a code from the family of codes described in Section IIV-AI is used to transmit M and the receiver 
makes a tentative decision t M using the decoder of the very same code. The transmitter knows t M because of the 
feedback link. In the remaining time instances, i.e., in the control phase, the transmitter sends the input letter a if 
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t M = M, the input letter r if t M ^ M. The input letters a and r are described in equation (fTOt . At the end of the 
control phase, the receiver checks whether or not the output sequence in the control phase is typical with W a , if 
it is then M = t M otherwise an erasure is declared. 

Lemma [3] given below states the results of the performance analysis of the above described code. In order to 
understand what is stated in Lemma [3] accurately, let us make a brief digression and elaborate on the codes with 
erasure. We have assumed in our models until now that M G M. However, there are many interesting problems 
in which this might not hold. In codes with erasures for example, we replace M G M. with M G M. where 
M. = M. U {x} and x is the erasure symbol. Furthermore in codes with erasures for each itieM the conditional 
error probability P e | m and conditional erasure probability, P x | m are defined as follows. 



P. 



e|m 



P. 



x|m 



P 
P 



M g {m,x} 



M 



x 



M 



M 



m 



m 



m 



m 



1,2, 
1,2, 



\M\ 
\M\ 



(24a) 
(24b) 



Note that definitions of P e | m and P x | m given above can be seen as the generalizations of the corresponding definitions 
in block codes without erasures. In erasure free codes above definitions are equivalent to corresponding definitions 
there. 

Lemma 3: For any block length n, rate < R < C and error exponent < E < (1 — jj)D, there exists a length 
n block code with erasures such that, 

\M\ > e n ( R - £ ") 



P B \i < e 



P e \ m < £ n min{l,e 



-n(E-e n ) 



} 



P 



x|m 



<e n + e 



-"((i-f). 



s l—E/D j 



m 



m 



2,3, 



1,2, 



\M\ 
\M\ 



where e n = ^^Wjih^±g. 

Proof of Lemma [3] is given in Appendix ICl 

Note that in Lemma |3l unlike P e | m 's which decrease exponentially with n, P x | m 's decays as It is possible 
to tweak the proof so as to have a non-zero exponent for P x | m 's, see HI. But this can only be done at the expanse 
of P e | m ' s - Our aim, however, is achieving the optimal performance in variable length block codes. As we will see 
in the following subsection, for that what matters is exponents of error probabilities and having vanishing erasure 
probabilities. The rate at which erasure probability decays does not effect the performance of variable length block 
codes in terms of error exponents. 



C. Single Message Message-Wise UEP Achievablity: 

In this section we construct variable length block codes for the single message message-wise UEP problem using 
Lemma [3] In first n time units the variable length encoding scheme uses a fixed length block code with erasures 
which has the performance described in Lemma [3] If the decoded message of the fixed length code is in the message 
set, i.e., if M G M. then decoded message of the fixed length code becomes the decoded message of the variable 
length code. If the decoded message of the fixed length code is the erasure symbol, i.e., if M = x, then the encoder 
uses the fixed length code again in the second n time units. By repeating this scheme until the decoded message 
of the fixed length code is in M., i.e., M G M., we obtain a variable length code. 

Let L be the number of times the fixed length code is used until a M G M. is observed. Then given the message M, 
L is a geometrically distributed random variable with success probability (1 — P x |m) where P x |m is the conditional 
erasure probability of the fixed length code given the message M. Then the conditional probability distribution and 
the conditional expected value of L given M are 

P[L = I|M] = (1-P X | M )(P X | M ) 1 - 1 1 = 1,2,... (25a) 

E[L|M] = (1-P X | M )- 1 . (25b) 
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Furthermore the conditional expected value of decoding time and the conditional error probability given the message 
M are 



E[T| M] = nE[L| M] 



M / M 



M 



P e , M E[L| M] 



(26a) 
(26b) 



where n is the block length of the fixed length code and P e |M is the conditional error probability given the message 
M for the fixed length code. 

Thus as result of equations d25b| ), d26l ) and Lemma [3] we know that for any rate R e [0, C], error exponent 
E G [0, (1 — %)D] there exists a reliable sequence Q such that Rq = R, Eq = E and 



E n 



E+(i-§)4d^ 



(27) 



We show in Section (IV-CI ) that for any reliable sequence Q with rate Rq = R and error exponent Eq = E, E mc j j( 
is upper bounded by the expression on the right hand side of equation (l27l ). 



D. Bit-Wise UEP Achievablity: 

In this section we first use the family of codes described in Section IIV-AI and the control phase idea described 
in Section IIV-BI to construct fixed length block codes with erasures which have bit-wise UEP. Then we use them 
with a repeat until non-erasure decoding scheme, similar to the one described in Section IIV-CI to obtain variable 
length block codes with bit-wise UEP. 

Let us start with describing the encoding scheme for the fixed length block code with bit-wise UEP. If there are 
£ sub-messages, i.e., if = (Aii x M. 2 x • • • x M.^), then the encoding scheme has £ + 1 phases with lengths 
Hi, n2, . . ., n^ + i such that ni + n2 + . . . + n^ + i = n 

• In the first phase a length ni code from the family of codes described in Section llV-AI is used. The message set 
of the code is M.\ U{|.Mi| + 1} and the message t Mi of the code is determined by the first sub-message: 
t Mi = Mi + 1. At the end of first phase receiver uses the decoder of the length ni code to get a tentative 
decision t Mi which is known by the transmitter at the beginning of the second phase because of the feedback 
link. 

• In the second phase a length n2 code from the family of codes described in Section IIV-AI with the message 
set = -M-2 U {I.M2I + 1}, is used. If t M is decoded correctly at the end of the first phase then the message 
t M2 of the code used in the second phase is determined by the second sub-message as t M2 = M2 + 1, else 
t M2 = 1. At the end of the second phase the receiver uses the decoder of the second phase code to get the 
tentative decision t M2 which is known by the transmitter at the beginning of the third phase because of the 
feedback link.. 

• In phases 3 to I above described scheme is used. In phase i, a length nj code, with the message set ^Mj = 
.MjU{|.Mj| + l}, from the family of codes described in Section ITV-AI is used. The message of the length n« 
code t Mj is Mj + 1 if Mi-i = Mi-i, 1 otherwise for i = 3, 4, . . . , t 

• The last phase is a n^ + i long control phase, i.e., a n^ + i long code with the message set %M.t+i = {1,2} is 
used in the last phase. The codewords for the first and second messages are n^ +1 long sequences of input 
letters r and a respectively, where r and a are described in equation (TTOb . The tentative decision in the last 
phase t M£ + i is equal to the first message if the output sequence in the last phase is not typical with W a , the 
second message otherwise. The message of the n^ +1 long code t M^ + i is equal to 2 if t M£ = t M^, 1 otherwise. 

Note that if we define t Mo, t Mo and M^ +1 all to be 1, i.e., t Mo = t Mo = M^ +1 = 1 we can write the following 
rule for determining the t Mj's for i = 1 to £ + 1. 

M = l + l{j3l 4 _ 1=)M4 _ 1 }Mi i = l,2,...,(£ + l) (28) 

It is important however to keep in mind that the last phase is a control phase and the codes in the first £ phases 
are from the family of codes described in Section |IV-Aj 

Note that during the phases i = 2 to £ erroneous transmission of t Mj_i is conveyed using t Mj = 1, hence the 
transmission of Mj through t Mj, i.e., t Mj = 1 + Mj, is a tacit approval of the tentative decision t Mj_i. Because 
of this, the above encoding scheme is said to have an implicit acceptance explicit rejection property. The idea of 
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implicit acceptance explicit rejection was first introduced by Kudryashov in Q in the context of non-block variable 
length codes with feedback and delay constraints. 

After finishing the description of the encoding scheme, we are ready to describe the decoding scheme. The 
receiver determines the decoded message using the tentative decisions, t M j for i = 1 to I + 1. If one or more of 
the tentative decisions are equal to 1, then an erasure is declared. If all £ + 1 tentative decision are different from 
1 then Mj = t Mj — 1 for all i = 1, 2, ... , £. Hence the decoding rule is 

- 1, t M 2 - 1, . . . ,jA e - 1) if Yltli&i - 1) > 



Mi,M 2 



,Mt) 








(29) 



if n£f(M-i) 

For bit-wise UEP codes with erasure, the definition of P e (i) is slightly different from the original one given in 
equation (fT4l 



{M l / m l ,M /x} 



(30) 



With this alternative definition in mind let us define P e | m (i) as the conditional probability of the erroneous 
transmission any one of the first i sub-message when M = m: 



Pe\m(i) 



M 



m 



(3D 



' {M l + m*,M ^x} 

The error analysis of the above described fixed length codes, presented in Appendix |Pl leads to Lemma [4] given 
below. 

Lemma 4: For block length n, any integer £ < ln( - 1 n +n ^ , rate vector R, and time sharing vector fj such that 



Ri < Cr, t 

m > o 

* — / «=i 

there exists a length n block code such that: 

\Mi\ > e n ( R '- £ - f ) 



where rj^ 



-P x |m < £ n,£ 



Vz G {1,2,...,, 
Vie{l,2,...,. 



Vi G {1,2,...^} 
Vz G {1,2,...*} 

Vm G At, i € {1,2, 
Vm G M. 



(32a) 
(32b) 

(32c) 



10|A-||y|ln(f)Vln(l+n) 



vt+z 



Recall the repeat at erasures scheme described in Section IIV-CI If we use that scheme to obtain a variable 
length code from the fixed length bit-wise UEP code described in Lemma @] we obtain a variable length code with 
UEP such that 



E[T| M] 



M* ^ M' 



M 



< 



-Pe|IVl(0 



M 



1,2,...,! 



(33a) 
(33b) 



As result of equation (l33l and Lemma 0] we know that for any rate vector R, error exponent vector E and time 
sharing vector fj such that 



Vt€{l,2,...,i} 

Vi G {1,2,...,*} 
Vi€{l,2,...,i} 



(34a) 

(34b) 
(34c) 

(34d) 



Ri < Cr?i 
»7i > 

Sj-i * - 1 

there exists a reliable sequence Q such that (Rq, Eq) = (R, E). Thus the existence of the time sharing vector fj sat- 
isfying the constraints given in (l34l) is a sufficient condition for the achievablity of a rates-exponents vector (R, E). 
We show in Section (IV-DI) that the existence of a time sharing vector fj satisfying the constraints given in (l34l is 
also a necessary condition for the achievablity of a rates-exponents vector (R, E). 
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V. Converse 

Berlin et. al. [1] used the error probability of a random binary query posed at a stopping time for bounding the 
error probability of a variable length block code. Later similar techniques have been applied in O for establishing 
outer bounds in UEP problems. Our approach is similar to that of 12 and 0; we, too, use error probabilities of 
random queries posed at stopping times for establishing outer bounds. Our approach, nevertheless, is novel because 
of the error events we choose to analyze and the bounding techniques we use. Furthermore, the relation we establish 
in Lemma [5] between the error probabilities and the decay rate of the conditional entropy of the messages with 
time is a brand new tool for UEP problems. 

For rigorously and unambiguously generalizing the technique used in 0]] and O we introduce the concept of 
anticipative list decoders in Section IV-AI Then in Section IV-BI we bound the probabilities of certain error events 
associated with anticipative list decoders from below. This bound, i.e., Lemma [5J is used in Sections IV-CI and |V-D| 
to derive tight outer bounds for the performance of variable length block codes in the single message message-wise 
UEP problem and in the bit-wise UEP problem, respectively. 



A. Anticipative List Decoders 

In this section we first introduce the concepts of anticipative list decoders and non-trivial anticipative list decoders. 
After that we show that for a given variable length code, any non-trivial anticipative list decoder (T, A) can be used 
to define a probability distribution, P{^}, on M. x y T *. Finally we use to define the probability measure [•] 
for the events in p(A4 x 3^ T )- Both the non-trivial anticipative list decoders (T,^4) and the probability measures 
[•] associated with them play key roles in Lemma [5] of Section IV-BI 

An anticipative list decoder for a variable length code is a list decoder A that decodes at a stopping time T that 
is always less than or equal to the decoding time of the code T. The anticipative list decoders are used to formulate 
questions about the transmitted message or the decoded message, in terms of a subset of the message set M. that 
is chosen at a stopping time T. For example let A be the set of all m € M. whose posterior probability at time 
one is larger than 1/|.M|. Evidently for all values of Yi, A is a subset of M., but it is not necessarily the same 
subset for all values of Yi. Indeed A is a function from 3^1 to the power set of Ai and (T,A) is an anticipative 
list decoder, for which T = 1. Formal definition, for anticipative list decoders, is given below. In order to avoid 
separate treatment in certain special cases we include the case when T = and A is fixed subset of M., in the 
definition. 

Definition 13 (Anticipative List Decoder): For a variable length code with decoding time T, a pair (T, A) is 
called an anticipative list decoder (ALD) if 

• either T is the constant random variable and A is a fixed subset of M., i.e., 

T = 

A e p(M) 

• or T is a stopping time, which is smaller than T with probability one, and A is a p(M.) valued function 
defined on y T , i.e., 



T < T 



1 



A: y f -+p{M). 

Definition of ALD does not require A to be of some fixed size, nor it requires A to include more likely or less 
likely messages. Thus for certain values of Y T , A might not include any m € M. with positive posterior probability. 
In other words for some values of Y T we might have 

P [M € -4(Y f ) Y f = y'l = 0. 



The ALD's in which such y l 's have zero probability are called nontrivial ALD's. 
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Definition 14 (Nontrivial ALD): An anticipative list decoder (T, A) is called a nontrivial anticipative list decoder 



(NALD) if P 



M G ^(Y T ) 



Y T 



> with probability one, i.e., 



M G ^(y t ; 



Y 



> 



(35) 



Below, for any variable length code and an associated nontrivial anticipative list decoder (T, A) we define a 
probability distribution Pjyg. on M x y T * and a probability measure P/4}["] for the events in p(A4 x y T ). For 
doing that first note that the probability measure generated by the code, i.e., P[-], can be used to define a probability 
distribution P on M. x y T * as follows: 



P(m,y 



t\A- 



M = m,Y T = y t 



Vm G M.,y l G y T * 



where y T * is a countable set for any stopping time, given in equation (l6bl . 

As T is a stopping time, the probability of any event T in p{M x 3^ T ) under P[-], i.e., P[r], is equal to 

p[r]= p ( m 'y 1 )- 

(m,y t )6rn(Mxy T *) 

Evidently we can extend the definition of P and assume that P is zero whenever y l is in 3 ;< p- = 



00} 



, i.e., 



P(m,y^0 



y m eMyey^ T=oo} . 



(36) 



(37) 



(38) 



This extension is neither necessary nor relevant for calculating the probabilities of the events in p(A4 x 3^ T ), 
because T is a stopping time, i.e., P[T < oo] = 1. 

Definition 15: Given a variable length code with decoding time T, for any NALD (T, A) let P^m be@ 

P^(m,/)4 P(y t) P(m '^ { "f (yI)} P(y t , +1 |y>) Vm G M.,y l G y* (39) 

Note that Definition [15] is a parametric definition in the sense that it assigns a for all nontrivial anticipative 
list decoders (T, A). While proving outer bounds we will employ not one but multiple NALD's and use them in 
conjunction with our new result, i.e., Lemma [5] But before introducing Lemma [5] let us elaborate on the relations 
between marginal and conditional distributions of P^ and P. 
For Pj^j. defined in equation d39l) we have 

E p {^}( m >y t ) = 1 - 

Hence P|_4} is a probability distribution on M x y T *, i.e., P^ G &{M x y T *). 

Note that the marginal distributions of P{_4} and P are the same on y T * . Furthermore for all y l G y T * and m G M. 
the conditional distributions of P^ and P are the same on y~~*. The probability distributions P/^ and P differ only 
in then conditional distributions on M. given y l . More specifically, 

Vy E G y T * 



P { ^}(y t ) = p(y t ) 



P{^}(m|y E ) 



P(m|y t )l{ mG ^ (y t )} 



EmeA^ P ( m ly t ) ;ll {me^(y E )} 

p {^}(yt+ily^ m ) = p (y t t+ily i ' m ) 



Vy f G y T *,Vm g M 
Vy* G y*,Vm G M. 



(40a) 
(40b) 
(40c) 



26 There is a slight abuse of notation in equation d39t ; if T is not a stopping time but rather a constant random variable T = 0, (m, y*) 
should be interpreted as 



r, / t \ A l{meyt} . t| \ 

P{^}(m,y )= rj, P(y |m) 



Vm e M,y e y 



T* 
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Using the parametric definition of probability distribution on M x y T * we define a probability measure 
P/^n [•] for the events in p(M. x y T ) as follows: 

p {-4}I r ]- E p w(™,y l ) VTG P (Mxy T ). (4i) 

(my)ern(.Mxy r «) 

Evidently we can extend the definition of P^} to M. x by defining it to be zero on M. x 3^/t =00 }> i.e., 

P^Cm.y*)^ Vme^/e^j. (42) 

As in the case of P, this extension is neither necessary nor relevant for calculating the probabilities [r] given 
in equation (|4TI ). 



B. Error Probability and Decay Rate of Entropy: 

In this section we lower bound the probability of the event that the decoded message M is in A under the 
probability measure Pf4}[-], i.e., P{_4} M ^ A(V T ) . The bounds we derive depend on the decay rate of the 

conditional entropy of the messages in the interval between T and T. 

Before even stating our bound, we need to specify what we mean by the conditional entropy of the messages. 
While defining the conditional entropy, many authors do take an average over the sample values of the conditioned 
random variable and obtain a constant. We, however, do not take an average over the conditioned random variable 
and define conditional entropy as a random variable itself, which is a function of the random variable that is 
conditioned onj£] 

ml Yn In i, Yt1 . (43) 



H(M|Y T )= p t M 

meM 



Using the probability distribution P defined in equation 
equal to, 

H(M|Y T ) = E 



P[M=m|Y T 

we see that the conditional entropy defined in (l43l is 



In 



P(M|Y- 



Y T 



(44) 



Lemma 5: For any variable length block code with finite expected decoding time, E[T] < 00, let (Ti,.Ai), 
(T 2 ,A 2 ), ■ ■ ;(T k ,Ak) be k NALD'f^ such that 



P[{0<T 1 <T 2 <---<T fc <T}] = l. 
Then for all i in {1,2,..., k] such that (P[M G A(Y T ')] + ^e) < 1/2 we have 



{A} 



M i A(Y T< ) 



> exp 



-h(P e + P[M G Ai(y Ti )]) - Eii+i E [ T i _T i-i] ■%) 



where Tq = 0, 



+1 



1-P e -P[M G A(Y T 0] 
T and for all j in {1, 2, . . . , (k + 1)}, r/s are given by 







ifP[T, 
ifP[T, 



1 j-i 



] = !' 
]<1 



(45) 



(46) 



(47) 



Tj = < EfH(M|Y T i-i)-H(M|Y T i; 

Proof of Lemma |5] is presented in Appendix [EJ 

Before presenting the application of Lemma [5] in UEP problems, let us elaborate on its hypothesis and rami- 
fications. We assumed that (Ti,Ai) are all NALD. Thus for each (Tj,^4j) the set of all y 1, G y Tz such that the 



Recall the standard notation in probability theory about the conditional expectations and conditional probabilities: Let H be a real valued 
random variable and G be a random quantity that takes values from a finite set Q, such that P[G = g] > for all g £ Q. Then unlike E[H], 
which is constant, E[H| G] is a random variable. Thus an equation of the form Z = E[H| G], implies not the equality of two constants but the 
equality of two random variables, i.e. it means that z = E[H| G = g] for all g G Q. Similarly let Hi be a set of sample values of the random 
variable H then, unlike P[H G "Hi], which is a constant, P[H £ "Hi] G] is a random variable. Equations l |43t and i44\ are such equations. 
Explaining conditional expectations and conditional probabilities are beyond the scope of this paper, readers who are not sufficiently fluent 
with these concepts are encouraged to read |10l Chapter I, Section 8] which deal the case where random variables can take finitely many 
values. Appropriately generalized formal treatment of the subject in terms of sigma fields is presented in 1101 Chapter II, Section 7], 
28 Recall ALD's and NALD's are defined in Definitions [T3] and [14] respectively. 
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transmitted message is guaranteed to be outside Ai(y u ), has zero probability and there is an associated probability 



measure Ppo^] given in equation (|4TT) . Furthermore Pui M ^ Ai(V T ') 



is the probability of the event that 



decoded message M is not in Ai under the probability measure P{^}[-]. 

Condition given in equation d45l ) ensures that the decoding times of the k NALD's we are considering, Ti, T2, 
. . .,Tfc, are reached in their indexing order and before the decoding time of the variable length code T. Any T±, 
T 2 , • • ., Tfc satisfying equation (1451) divides the time interval between and T into k + 1 disjoint intervals. The 
duration of these intervals as well as the decrease of the conditional entropy during them are random. For the j th 
interval the expected values of the duration and the decrease in the conditional entropy are given by E[Tj — Tj_i] 
and E[H(M|Y — H(M|Y T ^)1, respectively. Hence r^'s defined in equation (1471 ) are rate of decrease of the 
conditional entropy of the messages per unit time in different intervals. 

Lemma [5] bounds the probability of M being outside Ai under Pr^-J-] from below in terms of r,'s and 



also depends on P[M G A(Y Ti )] and P e . But 



E[Tj - Tail's for j > i. The bound on P {A} M £ Ai(Y T \ 
the particular choice of Aj 's for j 7^ i has no effect on the bound. This feature of the bound is its main merit over 
bounds resulting from the previously suggested techniques. 



C. Single Message Message-Wise UEP Converse: 

In this section we bound the conditional error probabilities of the messages, i.e., -P e |m' s > from below uniformly 
over the message set M. in a variable length block code with average error probability P e , using Lemma [5] 
Resulting outer bound reveals that the inner bound we obtained in Section IIV-CI for the single message message- 
wise UEP problem is tight. 

Consider a variable length block code with finite expected decoding time, i.e., E[T] < 00. In order to bound 
P e | m , defined in equation (TTTb . from below we apply Lemma [5] for k = 2 with (Ti,Ai), (T2,.4 2 ) given below. 



Let Ti be zero and A\ be {m}, i.e., 



Ti 





{m}. 



(48) 
(49) 



Let T2 be the first time instance before T such that one message, not necessarily the one chosen for Ai, i.e., 
m, has a posteriori probability 1 — 6 or higher, 



T 2 =mm{r : maxP[M = m| Y r ] > (1 - 8) or r = T}. 

m 

Let A2 be the set of all messages whose posterior probability at time T2 is less then (1 — 5), 



^ 2 (Y T2 )^{m G M : P 



M = ml Y 



< (1 -<*)}■ 



(50) 



(51) 



We apply Lemma |5] for (Ji,Ai) and (^2^2) given in equations (l48l . (|49l , (1501 and (IBTI ). Then using the fact 
that J(-) < D we get, 



lnP e | m > 



-h(p e +\M\- 1 )-'B[r 2 }j 



e[h(M)-H(M|Y T 2 )] 

5tJ 



-E[T-T 2 ]Z) 



1-P B -\M\~ 



lnP 



M ^^ 2 (Y T2 ) 



■> -/<P e +P[Mg^2(Y T 2)])^E(T^T 2 ]ZJ 

- l-P e -P[Me^ 2 (Y T 2)] • 



(52a) 
(52b) 



If S < 1/2 one can show P{^4 2 } M ^ ^(Y 72 ) is roughly equal to P e /5. Thus inequality in (I52bb becomes a lower 
bound on E[T — T 2 ] in terms of P e . It can be shown that the lower bound (I52ab takes its smallest value for the 
smallest value of E[T — T2]. Then using Fano's inequality for E[H(M|Y T2 )] we obtain Lemma |6] given below. 

A complete proof of Lemma [6] for variable length block codes with finite expected decoding time is presented 
in Appendix [0 For variable length block codes with infinite expected decoding time, Lemma [6] follows from the 
lower bounds on P e and P e | m derived in Appendix IH 1 1 and Appendix IH2I 

Lemma 6: For any variable length block code and positive <5 such that P e + 5 + + \ A4\~ 1 < 1/2 
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whereR=^),E=^,e = ^±p,6 1 = P e + 5+f + \M\^ and e 2 = t&ffi*. 

Lemma [6] is a generalization of |2] Theorem 8] and [2, Lemma 1]. While deriving bounds given in (2, Theorem 
8] and |f2] Lemma 1], no attention is payed to the fact that the rate of decrease of the conditional entropy of the 
messages can be different in different time intervals. As result both El Theorem 8] and [2] Lemma 1] are tight 
only when the error exponent is very close to zero. While deriving the bound given in Lemma [6l on the other hand, 
the variation in the rate the conditional entropy decreases in different intervals is taken into account. Hence the 
outer bound given in Lemma [6] matches the inner bound given in Section IIV-CI for all achievable values of error 
exponent, < E < (1 — %)D. 

Consider a reliable sequence of codes Q with rate Rq and error exponent Eq. Then if we apply Lemma [6] with 

6 = Rife) we § et 

E md , Q <E Q + (l-^)j( T3 g^). (54) 



Note that the upper bound on E mc j q's given in equation (1541 ) is achievable by at least one Q described in Section 
E33 



D. Bit- Wise UEP Converse: 

In this section we apply Lemma [5] to a variable length block code with a message set M. of the form Ai = 
M-i x M. 2 x ... x M.£, in order to obtain lower bounds on P e (i)'s for i = 1, 2, . . . ,£ in terms of the sizes of 
the sub-message sets \M\\, \M 2 \, ■■■> \-M-i\ and the expected decoding time E[T]. When applied to reliable code 
sequences these bounds on P e (i)'s in terms of |.Mj|'s and E[T] gives a necessary condition for the achievablity of 
a rate vector and error exponent vector pair (R, E) that matches the sufficient condition for the achievablity derived 
in Section ITV-Dl 

In order to bound P e (i)'s we use Lemma|5] with I NALD's, (Ti, ^4i),. . -,(Ti, Ae). Let us start with defining Tj's 
and A l (Y J ') , s. 

• For any i in {1, 2, . . . ,£}, let Tj be the first time instance that a member of M. % gains a posterior probability 
larger than or equal to (1 — S) if it happens before T, T otherwise: 

T t = min{r : max P M = m l I Y r ] > 1 - 5 or r = T}. (55) 

(m J ,m m ,...,m^) for 



For any i in {1,2, ...,£}, let _4j(Y T *) be the set of all messages of the form m 
which posterior probability of m ! is less than (1 — 5) at T^: 



A(Y T *)={K,m m ,...,m,) eM :P 



M' 



< 1-6}. 



(56) 



If we apply Lemma [5] for (Ti, Ai),. . -,(T^, Ae) defined in equations (|55T ) and (l56l ). we obtain lower bounds on 
M ^ A(Y T ') 's in terms of P[M G Ai(Y T *)] 's and r/s and E[Tj - T J+ i]'s for j > i. In order to turn these 

ds into bounds on P e (i)'s we bound P^ A .y M ^ Ai(Y Tt ) 's and P[M G Ai(Y T ')]'s from above. 
The posterior probability of a message at time r + 1 can not be smaller than A times its value at time r because 
mm xeX ,yt=y W x (y) = A. Thus if 5 < 1/2 one can bound Pr^i M ^ Ai{Y Tt ) 



{A,} 



M ^ Ai(Y Tt ) <±P e (i) 



's from above: 
Vi G {1,2,..., £}. 



(57) 



Note that if at T, there is a m* with posterior probability (1-5) then P[M G A(Y Tl )| Y Ti ] < 5. If at T 



there is no m ! with posterior probability (1 — 5) then P IW / M' 
P[M G Ai(Y T -)] from above: 



Y Ti 



> 5. Using these facts one can bound 



M G Ai(Y T ') 



<!f + 5 



Vi G {1,2,..., £} 



More detailed derivations of the inequalities given in (I57T ) and (I58T ) can be found in Appendix [ 
Using equations (TSTT ) and (I58T ) together with Lemma [5] we can conclude that, 



InPe(z) > HX5) + d<E^±£^^01zIl^M 



Vi€{l,2,...,£}. 



(58) 



(59) 
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provided that P e + S + P e /5 < 1/2, where r/s are defined in ( |47T ). 

Note that the lower bound on P e (i)'s given in equation d59l ) takes different values depending on the rate of 
decrease of the conditional entropy of the messages in different intervals, i.e., r^'s, and the expected duration of 
different intervals, i.e., E[Tj — Tj_i]'s. Making a worst case assumption on the rate of decrease of entropy and 
the durations of the intervals one can obtain Lemma [7J given below. 

A complete proof of Lemma [7J for variable length block codes with finite expected decoding time is presented 
in Appendix [G] For variable length block codes with infinite expected decoding time, Lemma [7J follows from the 
lower bounds on P e and P e (i)'s derived in Appendix IH 1 1 and Appendix IH3I 

Lemma 7: For any variable length block code with feedback with a message set M. of the formal M. = M. i x 
M.2 x • • • , Me and for any positive 5 such that P e + 5 + ^f- < g, we have 

(1 - e 3 )Ei - h < (1 - E- =1 1i) D + E - =m '/'• / (^r JL ) i = 1, 2, . . . , £ (60a) 

(1 - e 3 )Ri - e 4 l{i=i} < C Vi i = l,2,...,£ (60b) 

for some time sharing vector fj such that 

m > 1 = 1,2,...,* (61a) 

> J1 



E 



(61b) 



wtiere K 4 — , ^ — — ^ — , e 3 — ^ e + <5 + — , e 4 — e 5 — — ^ — . 

For any reliable sequence Q whose message sets Ai^' are of the form = m[ k) x M^f 1 x . . . x M { e K) if 

we set 5 to 5 = Lemma |7] implies that there exists a 77 such tha{^| 

Eq,* < (1 - j;* =1 Vj)D + E' =i+1 % J (^) Vi e {1, 2, . . . , £} (62a) 

Rfl,i<C7i7i Vi G {1, 2, . . . , £} (62b) 

J7i > Vi G {1,2,... ,£} (62c) 



E 



(62d) 



Recall that a rates-exponents vector (R, E) is achievable only if there exists a reliable code sequence Q such that 
(Rq,Eq) = (R, E). Thus a rates-exponents vector (R, E) is achievable only if there exists a time sharing vector 
fj satisfying equation (l34l ). In other words the sufficient condition for the achievablity of (R, E) we have derived 
in Section HV-DI is also a necessary condition. 



VI. Conclusions 

We have considered the single message message-wise and the fixed £ bit-wise UEP problems and characterized 
the achievable rate error exponent regions completely for both of the problems. 

In bit-wise UEP problem we have observed that encoding schemes decoupling the communication and bulk 
of the error correction both at the transmitter and at the receiver can achieve optimal performance. This result 
is extending the similar observations made for conventional variable length block coding schemes without UEP. 
However, for doing that one needs to go beyond the idea of communication phase and control phase introduced in 
lfT2~ll . and harness the implicit confirmation explicit rejection schemes, introduced by Kudryashov in Q. 

For the converses results, we have introduced a new technique for establishing outer bounds to the performance 
of the variable length block codes, that can be use in both message-wise and bit-wise UEP problems^ 

We were only interested in bit-wise UEP problem in this paper. We have analyzed single-message message-wise 
UEP problem, because it is closely related to bit-wise UEP problem and its analysis allowed us to introduce the 

29 We tacitly assume, without loss of generality, that \Mi\ > 2. 

30 This fact is far from trivial, yet it is intuitive to all who has worked with sequences of vectors in a bounded subset of where R £ is 
the £ dimensional real vector space with the norm \\X\\ = sup^ \xj\ For details see Appendix [J] 

We have not employed the bound in any hybrid problem but it seems result is abstract enough to be employed even in those problems 
with judicious choice of NALD's. 
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ideas we use for bit-wise UEP, gradually. However it seems using the technique employed in (2j Theorem 9] on 
the achievablity side and Lemma [5] on the converse side, one might be able to determine the achievable region of 
rates-exponents vectors for variable length block codes in message-wise UEP problem. Such a work would allow 
us to determine the gains of feedback and variable length decoding, because Csiszar Q had already solved the 
problem for fixed length block codes. 

Arguably, the most important shortcoming of our bit-wise UEP result is that it only addresses the case when 
the number of groups of bits I is a fixed integer. However this has more to do with the formal definition of the 
problem we have chosen in Section [III] than our analysis and non-asymptotic results given in Sections |IV] and |VJ 
i.e., Lemma |4] and Lemma |7] 

Using the rates-exponents vectors for representing the performance of a reliable sequence with bit-wise UEP, is 
apt only when the number of groups of bits are fixed or bounded. When the number of groups of bits t in a reliable 
sequence diverge with increasing k, i.e., when lim K _j.oo £ K = oo, the rates-exponents vector formulation becomes 
fundamentally inapt. Consider, for example, a reliable sequence in which |.M^| = [e^ 7 ' 1^]. The rate of this 
reliable sequence is R, yet the rate of all of the sub-messages are zero. Thus when £ K diverges the rate vector does 
not have the same operational relevance or meaning it has when l K is fixed or bounded. In order to characterize 
the change of error performance among sub-messages in the case when l K diverges, one needs to come up with an 
alternative formulation of the problem, in terms of cumulative rate of sub-messages. 

Our non-asymptotic results are useful to some extend even when t K diverges. Although infinite dimensional rates- 
exponents vectors falls short of representing all achievable performances one can still use Lemma [4] of Section HVl 
and Lemma |7] of Section [V] to characterize the set of achievable rate vector error exponent vector pairs. 

• As a result of Lemma [7] the necessary condition given in equation ( fT9l is still a necessary condition for the 
achievablity of rates-exponents vector. 

• Using Lemma @] we see that the sufficient condition given in equation (fT9l is still a sufficient condition as 
long as the number of sub-messages in the reliable sequence satisfy limsup^oQ n ^ n - = 0. 

Thus for the case when £ ~ o ^^jJ-L ), i.e., limsup E(K) r TW i /inEwrrwi = ^> me condition given in equation ( fT9l 
is still a necessary and sufficient condition for the achievablity of a rates-exponents vector. 
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Appendix 

A. Proof of Lemma [7] 

Proof: Note that J(R) defined in equation (fTTT ) is also equal to 

J(R) = max aD faW W Xl ) + (1 - a)D (/2 2 || W x . 2 ) 

0<a<l 
Xi,x 2 6A' 

«,xi,x2,/^i,/i9,RiiR' : Ri ,R-2£[0, C] 
I(Ati,W0>Ri 
I(,u 2 ,W0>R 2 
aRi+(l-a)R 2 =R 

= max aj(Ri) + (1 — a)j(R2) (63) 

0<a<l 
Q!,Ri,R 2 : Ri,R 2 e[0,C*] 

aRi+(l-a)R 2 =R 

where j(R) is given by 

i(R)= max B(fl\\W x ) VR G C. (64) 
xeA" 

l(/i,W)>R 
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Note that j(R) is a bounded real valued function of a real variable. Therefore, Caratheodory's Theorem implies 
that considering two point convex combinations suffices in order make j(R) a concave function. In other words 
for any k we have, 

v — >fc 

max aj(Ri) + (1 — a)j(R2) = max > aij(Ri) . (65) 

0<a<l 0<Qi<l Vi z — 'i=l 

a,Ri,R->: Ri,R2£[0,C] c.i,..,a t , 0<R,<C Vi 
aRi + (l-a)R 2 =R H H. ' E;a;=l 



Then the concavity of J(R) follows from the equations (I63T ). (l64l) and (l65l) . 

Evidently if the constraint set in a maximization is curtailed than resulting maximum value can not increase. 
Hence J(R) function defined in equation (fTTT ) is a decreasing function of R. 

As a result of the definition of D given in equation (© and the convexity of Kullback-Leibler divergence, we 
have D > J(0). On the other hand D (//|| W x ) = D and I (/x, W) > for x = r and fi(-) = l{.= a } where a and 
r described in equation (fT0~b . Therefore we have j(0) > D. Using the fact that </(R) > j(R) we conclude that 

j(o)=j(o) = D. m 

B. Proof of Lemma |2] 

Proof: We prove the lemma for a slightly more general setting and establish a result that will be easier to 
make use of in the proofs of other achievablity results. Let <5 7 [1], £/ 7 [m] and £> 7 [x n ] be 

g 7 [l] = {y n :n Q A(Q {y? « } ,/2 1 ) + (n - n a )A (Q {y „ Q+i} , /2 2 ) > 7} (66a) 

7 [m] = £ 7 [x n (m)] f| (n^ m £ 7 [x-(m)]) Vm 6 {2, 3, . . . , \M\} (66b) 



£ 7 [x n ] = {y 11 : n a A(Q K c iy n Q} , / xi^) + (n - n a )A(Q Ka+i)yL+i}) ^ < 7}. (66c) 

Note that g[m] and i3[x n ] given equations (EU), <E2]) and (f23l) are simply the Q 7 [l], £ 7 [m] and # 7 [x n ] for 
l = \X\\yWnMl + u). 
For all y n g <? 7 [1] we have, 

n Q D (Q {y - } || W Xl ) +(n-n Q )D(Q {y n Q+i} || W X2 

= n a D (Q{ y ^}|| /ii) + (n - n a )D (Q{ y » Q+1 }|| fi 2/ 



+ n a ^ Q {y?Q } (y) In $fgy + (n - n Q ) £ Q {y * a+l} (y) In -BM 



> n Q £ Q{y"°}(y) ^ #fgj + (n - n«) £ Q {yiUl }(y) In ^fgj 
y y 

(6) 

^n^DfeU ^ Xl ) + (n-n a )D(/2 2 || W X2 ) + 2 7 lnA. 



Inequality (a) follows from the non-negativity of the Kullback Leibler divergence. In order to see why (b) holds, 
first recall that min X)y W x (y) = A. Hence |ln 7^ | < In j and |ln ^- 2 ^ | < In j. Then the inequality (b) follows 
from the definitions of total variation A and £7 7 [1], given in equations ® and (I66ab and the fact that y 11 ^ <? 7 [1]- 



Note that the conditional error probability of the first message is given by 

M = 1 



P e|1 =p[lv1/1 



= P[Y n = y n | M = 1] . 

Recall that, the codeword of the message M = 1 is the concatenation of n a xi's and (n — n a ) X2's where n a = 
[na\. Hence the probability of all y n 's whose empirical distribution in first n a times instances is Q{ y ^°} and whose 

empirical distribution in [(n Q + l),n] is Q{ y n j is upper bounded by e naD ( < ^ty? Q }|| ( n n a) D (Q{yS 01 +i>|| _ 
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Furthermore, there are less than (n Q + 1)W distinct empirical distributions in the first phase and there are less than 
(n — n a + 1)1^1 distinct empirical distributions in the second phase. Thus 

p e]1 < (na + l)|y|( n _ nQ + 1 pi e -n a D( Al ||^ 1 ) +(a -n ( »)D( fe ||^ 2 )-2 7 lnA 

< g-nCoD^H ^.J+a-ajD^H W X2 )-e 2 ( 7! n)) 

, . . -27lnA+£>+2|y|ln(n+l) 

where e 2 ( 7 ,n) = — ! n — ^ -. 

The codewords and the decoding regions of the remaining messages are specified using a random coding argument 
together with an empirical typicality decoder. Consider an ensemble of codes in which first n a entries of all the 
codewords are independent and identically distributed (i.i.d.) with input distribution \x\ and the rest of the entries 
are i.i.d. with the input distribution /i 2 - 

For any message m other than the first one, i.e., m ^ 1, the decoding region is <7 7 [m] given in (I66bb . In other words 
for any message m, other than the first one, the decoding region is the set of output sequences for which (x n (m), y n ) 
is typical with (a, n% W, fi2 W), i.e., y 11 6 £> 7 [x n (m)], and (x n (m),y n ) is not typical with (a, \i\ W, /i 2 W), i.e., 
y n G £> 7 [x n (m)], for any m 7^ m. 

Since the decoding regions of different messages are disjoint, above described code does not decode to more 
than one message. Disjointness of decoding regions of messages 2,3,. . follows from the definitions of 

£7 7 [2],£/ 7 [3],. . .,<5 7 [|.M|], given in equation (166bl) . In order to see why <7 7 [1] n (U m ^i£/ 7 [m]) = holds, note that 
for any pair probability of distributions, the total variation between them is lower bounded by the total variation 
between their marginals. In particular, 

A(Q{ x ' J '°(m),y' 1 1 °}^l^) > A (Q{y?<*}>£l) 
A (Q{x" Q + 1 (m),yS Q + 1 }^2^) > A(Q {y n Q + i} ,/i 2 ) . 

Then as results of definitions of £ 7 [1], £> 7 [x n ] and <5 7 [m] for m / 1 given in equations d66a| ), d66cb and d66bb we 
have 

g 7 [l] n^ 7 [m] = m = 2,3,...|X|. 

Then for m G {2, 3, . . . , \A4\} the average of the conditional error probability of m th message over the ensemble 
is upper bounded as 

E[P eh ] < P[Y n i £ 7 [X n (m)]| M = m] + £ P[Y n 6 £ 7 [X n (m)]| M = m] . (67) 

Let us start with bounding P[Y n ^ # 7 [X n (m)]| M = m]. Let Si(x,y) and S 2 (x,y) be 

Si(x,y)=n Q |Q { x^ (m ) iY °°}(x,y) - m(x)W x (y)\, (68a) 
S 2 (x, y)=(n - n a ) \ Q{x« a+1 (nO.Y^+j (x, y) - M2 W W x (y) I • (68b) 

As a result of the definition of total variation distance given in equation (Q~|) and above definitions we have 

n a A(QpQ<* (m ) )Y » a} ,/iiP7) + (n-n Q )A^Q {X n i+l ( m ),Ys i+1 }>^2W / ) = 5 ^ xy [Si(x,y) + S 2 (x,y)] 
Thus the definition of ,B 7 [x n (m)] given in equation (I66cb implies that 



P[Y $ £ 7 [X n (m)]| M = m] 



$}Si(x > y) + S 2 (x,y)] > 2 7 



x.y 



M 



m 



(69) 



If for all x G X, y G y and j G {1,2}, S^x.y) < -7 1 | ~ 1 1 ~ 1 then £ X)y [Si(x,y) + S 2 (x,y)] < 2 7 . Thus if 
Y ^ ,6 7 [X n (m)] then for at least one (x,y, j) triple Sj(x,y) > 7|A'|~ 1 |3^| _1 . Using the union bound we get 



[E x , y [ S i( x <y) + S 2 (x,y)] > 2 7 | M = m] < P [s^x.y) > ^ 

[Sj( x >y) 



M 



m 



(70) 



For bounding P|Sj(x, y) > [^jjyj M = m , we can simply use Chebyshev's inequality, however in order to get 
better error terms we use a standard concentration result about the sums of bounded random variables, H Theorem 
5.3]. 
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Lemma 8: Let Zi, Z 2 , . . . , Z& be independent random variables satisfying |Z, — E[Z.j] | < Cj for all 1 < i < k. 
Then 



>7 



< 2e 



For all /ii € x £ X, y £ y we have q = 1 for alH = 1, 2, . . . , n Q ; thus 



Si(x,y) > 



MM 



M = m 



< 2e 2|Arp|ypn Q 



< 2e zpepW^. 



Similarly, 



S 2 (x,y) > 



TO 



M = m 



< 2e *\x\»\y\»n 



Using equations d69]), d70i (TTTb and we get 



£ £ 7 [X n (m)]| M = m] < 4|AT|y|e 2 I*IW° 
Now we focus on P[Y n G i3 7 [X n (m)]| M = m] terms. Note that all y n in i3 7 [x n (m)] satisfy 

n a A(Q{^(m) iy ;«},/ii VF) +(n-ni)Am {xSa+i ( ffi ) iySci+i} ,/i2T^J < 7- 
On the other hand, when M = m, X n (m) and Y n are independent and their distribution is given by, 
P[(X n (m), Y n ) = (x n (m),y n )| M = m] = JJ"" ^(m))/^*) ^ ^(x, (m))/x 2 ( y ,) 

= e - n = D (Q{x^ (5,), y ^ } H^iAi) e -n Q H(Q {x „ Q (S) y „ Q } ) 

-(n-n a )D(Q {x „^ + i(S) . y „ Q + i} ||/x 2 A 2 ) -(n-n Q )H(Q {x n^ + i( ~ )y „^ + i} ' 



(71) 



(72) 



(73) 



(74) 



(75) 



Furthermore the number of (x^° (m), y^°) sequences with an empirical distribution Q{ x na (m),yl a } is upper bounded 

as e n ° H (^ {x " 0(ff,) y i Q> ). In addition there are at most (n Q + 1)1*11^1 different empirical distributions. Using these two 
bounds and their counter parts for (x£ +1 (m),y^ +1 ) together with equations (l74b and (TTSb we get 

P[Y n G £ 7 [X n (m)]| M = m] < (n a + l)™(n- n a + i)l*Pl e -n»D( Ml w||^0-(a-n„)D(^ wi^j-^inA 



(n a + l)l*H y l(n-n a + l) 



l- ; f||yL-n Q I(/ii,^)-(n-n Q )I( M2 ,W)-27lnA 



< e -n(aI(Mi,W)+(l-a)I(jU3,W)) e C+2|-V||3;|lji(n+l)-27lnA 
7 2 

Hence if |X \ {1}| = 4|Af| |^|e 2\x\ 2 \y\ 2 n e n( a i^ 1 ,w)+(i~a)i( f , 2 ,w)) e ~c~2\x\\y\in(n+i)+2- y hi\ then 

V P[Y n G £ 7 [X n (m)]| M = ml < 4\X\y\e 
Thus the average P e over the ensemble can be bounded using (I67T ). (f70l > anddTTt as 



(76) 



(77) 



E[P e ] < 8|#| |^|e 2|^| 2 |y|%. 

But if the ensemble average of the error probability is upper bounded like this, there is at least one code that has 
this low error probability. Furthermore half of its messages have conditional error probabilities less then twice this 
average. Thus for any block length n, time sharing constant a G [0, 1], input letters xi,X2 G X, input distributions 
G &{X) there exists a length n code such that 

|_A4\{1}| > e n ( Ql Oi> W)+(i-a)i(n 2 , w)-ei(7,n)) (78a) 

p < e -n(aD(/i 1 ||W xl )+(l-a)D(p 2 ||W )<2 )- £2 (7,n)) (7 g b) 
Pe|m<£3(7,n) m = 2, 3, . . . , \M | (78c) 
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where 



ei(7jn) = C-ln(2|^||y|)+2|^||y| ln(n+l)-2 7 lnA) + 



2|^| 2 |y| 2 n 2 



7J+2|y|ln(n+l)-27 In A 
n 

7 2 



£2(7>n) 

e3(7,n) = 16|A-||^| e ~Wm^. 

Lemma |2] follows from equation (1781 ) and the fact that 

9|A'||y|(l-lnA) v /ln(l+n) 



£i(7>n) 



7=l^l|y|V nln (l+n) 



< 



1,2,3 



for £i(7,n), £2(7.n) and £3(7,11) given in equation d79"T ). 



(79a) 
(79b) 

(79c) 



(80) 



C Proof of Lemma \3\ 

Proof: Let ni be m = [(1 — §)n]. Recall that we have assumed E < (1 — |y)-D, then we have |j < 1 — 
Consequently ^ < ^ and ^-R < C . On the other hand as a result of equation (|78T ) and the definition of J(-) 
given in equation (fTTT ). for any positive integer ni, positive real number 71, rate R < C there exists a length ni 
code such that, 



|.A/f| — 1 > e ni[R-£i(7i.ni)] 

p . < e -ni[j(R) -£2(71,111)] 
e|l — 



m 



2,3,...,|A<| 



(81a) 

(81b) 
(81c) 



-P e |m < £3(7i,ni) 

where £1(71,111), £2(71,111), £3(71,111) are given in equation d79l ). 

We use such a code in the first phase with R = ^-R and call its decoded message t M, the tentative decision. 

Then as a result of equation (Hfl and the fact tha^l nijY^R^ > n(l - j^jY ^E/g ) we get 

^ y, g nR— ni£i(7i,ni) 

-n(l-|)<ir| 7S )+m£2(7i,m) 



P 
P 



|M| 

tM + m M = 1 
t M / m M 



m 



< e 

< £3(71,111) 



m 



2,3,...,|M|. 



(82a) 
(82b) 
(82c) 



The transmitter knows what the tentative decision is and determines the channel inputs in the last (n — ni ) time 
instances depending on its correctness. If t M = M the channel inputs in the last (n — ni) time instances are all a, 
if t M 7^ M the channel inputs in the last (n — ni) time instances are all r. 

After observing Y n , receiver checks whether the empirical distribution of the channel output in the last (n — ni) 
time units is typical with W a , if it is then M = t M otherwise M = x. Hence the decoding region for erasures is 
given by 



g 7 [x] = {v 11 : (n - m)A(Q {ySi+i} , W a ) > 72} 



Let us start with bounding P 
First note that 



M 



x 



t M = m, M = m 



, i.e., the probability of erasure for correct tentative decision. 



(n-m)A(Q {YSi+i} , W a ) =IY, S (V) 



where S(y) = (n — ni)|Q{yj j(y) — Wa(y)|- Then following an analysis similar to that one presented between 
equations (|69l and (1731 ) we get 



M 



,-M = m, M 



m 



<2\y\e 2 ^l 2 (n-ni) 
= £3(72,n-ni) 

32 Recall that ni > (1 — E/Z))n and J(-) is a non-increasing and positive function. 



Vm G M. 



(83) 
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In order to bound the probability of non-erasure decoding when tentative decision is incorrect, note that 
P [YS I+ i = y£ 1+1 | tM + m, M = m] = ]T j=ni+1 W M 

= e ~^ n " ni ' )D ( Q{y Si+i } ||' i2 ) e _ ^ n ~ ni ^ H ( Q{y Si+i > ) 
Then following an analysis similar to the one between (|75T ) and (T76T > we get 

M / x t M / m, M = ml < min{(n - m + i)|y| e -(n-n 1 )D-2 73 in A ^ 

< m i n { e -nE+|y|ln(n-n 1 )+D-2 72 lnA ) 1 | 

< min{e^ nE+(n - ni)£2(72 ' n - ni) , 1} Vm G M. 



Furthermore the conditional error and erasure probabilities can be bounded in terms of P 



tM/m 



M / x 



t M / m, M = m 



and P 



M = x 



t M = m, M = m 



as follows. 





,=P 


tM ^ m 


M = m 


P x\n 


,<P 


tM/m 


M = m 



M /x 



t M / m, M = m 



+ P 



M = x 



t M = m, M = m 



Vm G M 
Vm G M. 



Using the equations (1821) . (1831 ). (184b and ([85]) we get 

l^fl _ 1 > e nR-niEi(7i,n I ) 

p < e ^ n ( 1 -f) J (-rrlA5)+ n i e 2(7i,ni) mm | e -nE+n 2 e 2 (7 2 ,n 2 ) ; 

P X |1 < e-^-S^i^)^^) +£3(72in2) 
^elm < £3(7,110 minle-^+^l 111 ^ 1 ) 4 - 11252 ^ 2 ' 112 ),!} 

P x |m < £3(7i,ni) + £3(72,112) 

where n2 = n — ni . 



m / 1 



We set jj = \X\\y\y/5nj\n(l + n) for j = 1,2 and obtain 



n J -£i(7„n i ) < 2|#| 1^1 (ln(n + 1) - V5nln(l + n) In A) + (5/2) ln(l + n) 
n J -£ 2 (7 j ,n 3 -) < 2|^p|(ln(n + 1) - \/5nln(l + n) In A) + D 
£3(7^) < 16|Af||^|/(l+n) 5 / 2 

Lemma [3] follows from the identities \X\ >2, \y\ >2, D<ln(j), n> 1 and the equations (l86l ) and (I87T ). 



(84) 
M = m 



(85a) 
(85b) 

(86a) 

(86b) 

(86c) 
(86d) 
(86e) 



(87a) 
(87b) 
(87c) 



D. Proof of Lemma |4] 

Proof: Note that given the encoding scheme summarized in equation (1281 and the decoding rule given in 
equation (|29]>, if M = x then there is a i < £ + 1 such that t Mj = t Mj for all j < i and t Mj / t Mi- Thus the 
conditional erasure probability P x | m is upper bounded as 



V P A^il + mi 

* — '1=1 



M = m, t Mi = t Mi, . . . , tMi-i = Mi-! 



E m p 

-t— 'i=l 



tMi ^ tM, 



tMi = 1 + m, 



(88) 



Similarly if M ^ x and M* ^ W then for all j > z, t Mj = 1 and t Mj ^ 1; furthermore there is a & < i such that 
t Mj = t Mj for all j < k and t Mk / t Mfe. Hence one can bound P e \ m (i) as 

tM,- / t M, | tM,- = 1 + mj] ] n'l' p [tM, / 1 



^ m »<[E; =1 



t M, = 1 



(89) 
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In the first £ phases, we use rij = L^nJ l° n g codes with rate Si with the performance given in equation ( f8TT >. 
Thus for 1 < i < i we have, 



'R. 



tM ? ; = 1 



tMi / t M, 



tMi = 1 + m 2 



< e 

< e3(7„n,) 



m, = 1,2,3, ... , (| t A4,| - 1) 



(90a) 

(90b) 
(90c) 



where £1(74,14), £2(74,11,), £3(74,114) are given in equation ( 1791) . 

In order derive bounds corresponding to the ones given in equation (l90l for the last phase let us give the decoding 
regions for 1 and 2 for the length n^+i code employed between (n + 1 — n^ + i) and n. 



S 7 [l] = K+i-n f+1 : n m A(Q {y n +i nf+i} , W a ) > 

= K+i-n f+1 = n £+1 A(Q {yl+i ne+i} , W a ) < 7m }. 
Following an analysis similar to the one leading to equations (l83l) and (l84l) we get 



p 


tM^+i 


7^1 


tM £+ i 


= 1 


^ e -n f+ i_D+n f+ i£ 2 (7«+ii n t , +i 


p 


tMm 


7^2 


tM^+i 


= 2 


< e3(7f+i,n f+ i) 



Using equations ([88]), d89]>, (O and (ED we get, 

\Mi\ > e nR '^ £ ^'^- c 



Vi = 1,2,... ,£ 
Vi = 1,2,... ,£ 



(91a) 
(91b) 

(92a) 
(92b) 



n E 7T n Yl n 3 e 2 (7 J ,n J )+D 



^e|m(*) < 53 £3 ^' n ^ min i 1 ' e 
J=l 

-P x |m < ^^3(7j.n 3 ) 

i=i 

If we set 7, = | 1 13^| \/ 4n^ ln(l + n) for i = 1, 2, . . . , {1 + 1) for ei( 7i ,n i ), £2(74,114) and £3(7^) given in equation 
d79l we have 



Vi = 1, 2, . . . , £, Vm e A4 (92c) 
Vm e X (92d) 



ni£i( 7i ,n0 + C < 2|Af||3 ; |(ln(n i + 1) - ln(l + n) In A) + 21n(l + n) + C 
^£2(7^,) + D < 2|^p|(ln( ni + 1) - ^4^ ln(l + n) In A) + 2D 
£3(74,114) < 16|^||^|/(1 + n) 2 

Using the concavity of sfr function we can conclude that, 

i+i , 

J2 ^lkMh)±0 < ^^l^l AW)ln(l+n) _ (^±1) 2 /_n_ ln(1 + n)lnA \ + 2(€+l) ln(l+n) + (*±1) ^ ^ 



E 



ni£2(7i,ni)+_D 



€+1 



<2|^||y| 



(l+l)ln(l+n) _ (£+1). 



^2^1 n (l + n)lnA)+^2£ 



E £ 3(74,n4)<8|^||^||±i 



(93b) 
(93c) 



Then Lemma |4] follows from equations d92l and d93l for any ^ < ln / 
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E. Proof of Lemma [5] 

Proof: For P defined in equation (l36l ) as a result of equation (l37l) we have 

P 
P 



M g A(Y T< )1 = Y r ~ -i Ply*) 



m g A(Y T ol = V r - , P(y t 



For Pj_4j denned in equation 091 ) as a result of equation (|4TI ) we have 



{A} 



{A} 



MgA(y t ')1 = E r ~ i PuUy 1 ) 

v '\ ^--'y l 6{y t :M6A(Y T *)}ny T * 11 

m g A(Y Ti )l =y r - , Puity 1 ) 



(94a) 
(94b) 

(95a) 
(95b) 



Using equations d94l and d95l ) together with the data processing inequality for Kullback-Leibler divergence, we get 



PtMEA(Y^)] 



M £ A(Y T ' 



In 



p[m?A(y t 0] 



Since < P. 



{A 



i} [mgA(y t - 



< 1 we have 



y'ey T * 



+ 1-p 



M g A(Y T 



In 



P Md [M^A(Y T 0] ' 



(96) 



Note that if M G A(Y Tl ) and M ^ A(Y Tl ) then M / M. Consequently 



M G A(Y Ti )] =P[{M G A(Y t -),M £ A(Y T ')}] +P[{M G A(Y T '),M G A(Y T ')} 



< Pe + P 



M G A(Y Tl 



(97) 



Since the binary entropy function h(-) is increasing on the interval [0, 1/2] if P e +P[M G A(Y T *)] < 1/2 equations 
(1961 ) and d97]) imply 



£ P(/)i n _^L T >_^P e+P [MGA(Y T 
y'ey T * 
Let B, B* and B T be 



+ 1-Pe-P 



mga(y t ': 



In- 



B4ln 



P(Y T ) 



D * A i P(Y T ) fl 

B =ln P{^}(Y T ) 1 { T < 00 > 

R A 1t1 P(Y TAt ) 



VrG{l,2,...} 



(98) 

(99a) 
(99b) 
(99c) 



where T A r is the minimum of T and r. 

Note that as r goes to infinity, B r — > B and B T — > B* with probability one. Since |B T | < Tin j and E[T] < oo, 
we can apply the dominated convergence theorem [10, Theorem 3 p 187] to obtain 



E[B] = E[B*] = lim E[BJ . 
Finally for B and B* defined in equation (l99l we have 



y l ey* 



Thus as a result of equations (1 1001 ) and (1 1 1 b we have 



E 



1n P(Y T ) 
111 P Wl }(Y T ) 



E ^ptSt 



(100) 

(101a) 
(101b) 

(102) 



fey 1 
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Furthermore using the definition of P^.j given in equation (1391 ) we get 



E 



In 



^]= E ['» 



P(Yl i + 1 |Y T -) 
W Y I !+1 |Y T 



where for all z > 1 and j > i 



E 



In 



p(Y T ^:+iiY T o 



PfA l} (Y T ^i|Y T ,) 



if p[T i+1 = !,,•] = r 

ifP[T J+1 = T,]<lf • 



Assume for the moment that, 



(103) 



(104) 



(105) 



where Tk+i = T and is defined in equation (1471) . 

Then Lemma © follows from equations (|98]>, (11021 ). (11031 ) and (11051) . 

Above, we have proved Lemma [5] by assuming that the inequality given in (11051 ) holds for all i in {1,2, ... , k} 
and j in {(i + 1), . . . , (k + 1)}; below we prove that fact. 

First note that if P[T J+ i = T,-] = 1 then as result of equations d471) and d 1031 > equation (11051 ) is equivalent to < 
0J(0) which holds trivially. Thus we assume hence forth that P[Tj +1 = Tj] < 1, which implies E[Tj +1 — Tj] > 0. 

Let us consider the stochastic sequence 



In 



P(YT. +1 |Y T .) 



T^VE J( I ( M;Y 



fc-i 



1 



{t>T,} 



(106) 



where I (M; Y& \ Y k l ) is the conditional mutual information between M and given Y k 1 , defined as 



I ( M; Yz 



Y 



k-l 



=E 



m P(Y fc |Y*-i) 



Y 



k-l 



Note that as it was the case for conditional entropy, while defining the conditional mutual information we do not 
take the average over the conditioned random variable. Thus I (M; Y& |Y fc_1 ) is itself a random variable. 
For U T defined in equation (11061 ) we have 

U T+ i - U T = (-ln P{ ^J ( V + ;)y | ;i ) + J(\ (M; Y T+1 |Y^))) l {r > Tj} . (107) 

Conditioned on Y T random variables M — X r+ i — Y T+ i form a Markov chain: thus as a result of the data processing 
inequality for the mutual information we have I (X T+ i; Y r+ i |Y T ) > I (M; Y r+ i |Y r ). Since J(-) is a decreasing 
function this implies that 



J(\ (M; Y r+ i |Y r )) > J(\ (X T+1 ; Y T+1 | Y T )) 



(108) 



Furthermore, because of the definitions of J(-), P and P{_4} given in equations (fTTl ). (l36l ) and (l39l ). the convexity of 
Kullback Leibler divergence and Jensen's inequality we have 



In 



P(Y T+ i|Y-) 



P{A}(Yr+l|Y T 



Y T 



J(I(X T+1 ;Y T+1 |Y T )) >E 
Using equations (11071 ). (11081 ) and (11091 ) we get 

E[U r+1 |Y^] > U r . 

Recall that min Xjy W x (y) = A and | J(-) | < D. Thus as a result of equation (11071 ) we have 

E[|U T+ i - U r ||Y r ] <ln± + D. 

As a result of (II 101 ). (Ill II ) and the fact that Uo = 0, U T is a submartingale. 

Recall that we have assumed that P[Tj-+i < T] = 1 and E[T] < oo; consequently 

E[Tj+i] < oo. 



(109) 



(110) 



(111) 



(112) 
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Because of (11 1 II ) and (II 121 ) we can apply a version of Doob's optional stopping theorem iTTOl Theorem 2, p 487] 
to the submartingale U T and the stopping time Tj + \ to obtain E[Uj J+1 ] > E[Uo] = 0. Consequently, 

eL p(Y ^'y T ^ 1 < E [y Tj+i j(\ (m ; y t ir- 1 ))" 1 

Note that as a result of the concavity of «/(•) and Jensen's inequal 



(113) 



lity we have 



E 



Z-*/ T >i E[T j+ i-Tj] 



:E[T i+1 -T,]E 



/ Efv M lr T . -j- i l(M;Y T |Y T_1 )1 

< J!i[ I J+1 - \j\Jl ^T j+1 -TjJ 



Clearly E[V r+ i| Y r ] = V r and E[|V T |] < In \M\ + Cr < oo. Hence 
Furthermore, 



In order to calculate the argument of «/(•) in (II 14b consider the stochastic sequence 

V T = H(M|Y T ) + ^ r _ i I (M; Yj lY^ 1 ) . 

V r is a martingale. 

E[|V T+1 -V r ||Y T ]<ln|.M| + C. 
Recall that we have assumed that P[Tj < Tj+i < T] = 1 and E[T] < oo; consequently 

E[T,] < E[T i+ i] < oo. 

As a result of equations (II 161 ) and (II 171 ) we can apply Doob's optimal stopping theorem, iTTOl Theorem 2, p 487] to 
V T both at stopping time Tj and at stopping time Tj+i, i.e., E [Vy i+1 ] = E[Vq] and E [V-rJ = E[Vq]. Consequently, 



(114) 
(115) 

(116) 
(117) 



E 



T>1 



E 



H(M|Y Tj ) - H(M|Y 



Using equations (fTT3l , (fTT4l ) and (fTT8T ) 



E 



In 



p(Y T ^:tiiY T i) 



< 



1 / E[H(M|Y T J )-H(M|Y T J +i; 
h\ J { EJT j+1 -T 3 -J 



pm i} (y t ^1|y t o_ 

Hence inequality given in d 1051) not only when P[T,- + i = T,-] = 1 but also when P[Tj+i = Jj] < 1. 



(118) 



(119) 



F. Proof of Lemma\6\for The Case E[T] < oo 

Proof: In order to bound P e | m from below we apply Lemma [5] for (Ti,Ai) and (T 2 ,.A 2 ) given in equations 
(|4"8i <@3), d50l) and (f5B and use the fact that /(•) < D we get 



In P e | m > 



-^Pe+IMI-^-ETTalJ 



E[H(M)-H(M|Y T 2 )] 



E?T 2 ] 



-E[T-T 2 ]Z) 



lnP 



M ^^ 2 (Y T2 ) 



l-Pe-l^h 1 

> -/i(Pe+P[Me.4 2 (Y T 2)])~E[T-T 2 ] J D 



(120a) 
(120b) 



l-Pe"P[Me^ 2 (Y T 2)] 

provided that \M\~ l + P e < 1/2 and P[M G -4 2 (Y T2 )] + P e < 1/2. 

We start with bounding P{_4 2 } M £ .4 2 (Y T2 ) from above and P[M ^ .4 2 (Y T2 )] from below. 

• Since min xg ^ yg y W x (y) = A the posterior probability of a message at time r + 1 can not be smaller than 
A times the posterior probability of the same message at time r. Hence for the stopping time T 2 defined in 
equation (|50l ), random_l set A2 defined in equation (15TT ) and S < | we have 



M G „4 2 (Y 



TV 



Y T2 = y t2 



> XS 



v y t2 g y 



T 2 * 



(121) 



The set .A2 is random in the sense that it depends on previous channel outputs. 
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As a result of the definition of P| v 4 2 }(m,y t ) given in equation ( f39b we have, 

P W (m,y') < P(m,y t ) 1|me tf' 2)} Vm G M,y l G 3^ T *. 

If the decoded message M(y l ) is not in ^(y' 2 ) and message m is in ^(y* 2 ) then M(y t ) ^ m: 



vm X.y : >' T . 
Using equations (11221 ) and (11231 ) we get 

PwK^ft^W} < P(m,y t )^ i Vm eM,y l € ;y T *. 

If we sum over all (m, y*)^ in M. x 3^ T * and use equations (I37T ) and (I4TT) we get, 



m g A 2 (y T2 



< 



p[m^m] 



A<5 



(122) 



(123) 



(124) 



(125) 



The probability of an event Ti is lower bounded by the probability of its intersection with any event i.e., 

pfr^p^rs}]: 



> P 
= P 



{m ^ M,A 2 (Y T2 ) = M 



M ^ M 



M 



A 2 {Y T2 )=M 



(126) 



Note that if A 2 (y t2 ) = A4 then T is reached before any of the messages reach a posterior probability of 1 — 5. 
Thus 



M ^ M 



A 2 {Y T2 



Thus as a result of equations dl26l ) and d 127b we have 

^ 2 (Y T2 ) = 



M 



M 



< 



> 6 



(127) 



(128) 



On the other hand if ^(y* 2 ) 7^ M, then the most likely message with a probability at least (1 — 5) is excluded 
from A 2 (y t2 ). Thus 



M G A 2 (Y T2 ) A 2 (Y T2 ) ^ M 



< 6 



(129) 



Using equations (11281 ) and d 1 29b together with total probability formula we get 



M G A 2 (Y T2 



M G A 2 (Y T2 ) A 2 (Y T2 ) = M P -4 2 (Y T2 ) = M 



+ P 



M G A 2 (Y T2 ) A 2 (Y T2 ) ^ M] pU(Y T2 ) + M 



< P 



A 2 (Y T2 ) = M] + P M G -4 2 (Y T2 ) ^i(Y Tl ) ^ M 



<if + S. 



(130) 



We plug the bounds on P{_4 2 } M ^ .4 2 (Y T2 ) and P [M ^ -4 2 (Y T2 )] given in equations (11251 ) and (11301 ) in equation 
(TT201 to get 



lnP e | m > 



. „ T , J e[h(M)-H(M|Y t 2)1\ 

-/KeO-EfTalJl 3 _ i j_E[T-T 2 ]L> 

r=ii 



(131a) 
(131b) 



AO — 1— El 

provided that ei < 1/2 where l\ = P e + 5 + + | M 

Now we bound E[H(M|Y T2 )] from below. Note that l{Me.4 2 (Y T 2)} is a discrete random variable that is either 
zero or one; its conditional entropy given Y T2 is given by 



H(l { M e ^ 2 (YT 2 )}|Y T2 ) = h(p\M G ^l 2 (Y T2 ) 



Y 



T- 



(132) 
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Furthermore since l{MeA>(Y T 2)} i s a function of Y T2 and M, chain rule entropy implies that 



Y 



(133) 



H(M|Y T2 ) = H(l {Me ^ 2(YT2)} |Y T2 ) + E[H(M|Y T M {Me ^ (YT2)} ) 

Since .4.2 (Y 1 " 2 ) has at most \M\ elements and its complement, M. \ ^(Y" 1 " 2 ), has at most one element, we can 
bound the conditional entropy of the messages as follows 

H(M|Y T2 ,l {Me ^ 2(Y T 2)} ) < l {Me ^ 2(Y T 2)} ln|A^| 

Thus using equations (11321 ), ( 11331 ) and d 1 341 ) we get 



(134) 



H(M|Y T2 ) < h[P M G -4 2 (Y T 



Y 



+ P 



MeA 2 (y T ' 2 ) Y T2 1 ln\M\. 



Then using concavity of the binary entropy function h(-) together with equations (11301) and (11351) we get 



E 



H(M|Y n 



<h(6+!f) + (6+!f)\n\M\. 



(135) 



(136) 



provided that 6 + ^ < 1/2. 

If we plug in equation dl36l ) and the identity H(M) = In \A4\ in equation (11311) we get, 



(1 - h 



InP, 



> Kh) _ j I (1— £i)R— /i(ei)/E[T] 
-ft(ei)+lnA(5 



:i-ry)Z> 



:i-ei)E> -"iffip" - (1 - >7)^ 

E[T 2 ] ~ _ D , j 



lnP e 



provided that h < 1/2 where 77 = ei = P e + * + ^ + l-Ml -1 , R = ^ and E - T 
Note that the inequality given in equation (1137bb bounds the value of 77 from above, 



n 



< , _ (l-gQE-ga 



(137a) 
(137b) 



(138) 



where I2 



fc(ei)-lnA<5 



Furthermore for any 771 < 772 < § as a result of concavity of J(-) we have 

m + (1 - = mJ(%) + im - vi)J(o) + (1 - m)D 



<V2J[%)+(1-V2)D. 



(139) 



Using equations { n 



ei j, E 62 if E > and by its value at 77 = 1 otherwise, i.e., 



lnP e 



E[T] 



> 



-E 



R-- 



1-- 



D 



1-ei 1-ei 



J((l-ei)R-e 2 ) 



if E > ^4- 

— 1— ei 

if E < 



where e 



gi£>+g 2 

1-ei ■ 



Then, for the case E > Lemma [6] follows from the fact that J(-) is a non-negative decreasing function. For 
the case E < in Lemma [6] follows from the fact that </(•) is a concave non-negative decreasing function. 



G. Proof of Lemma\7\for The Case E[T] < 00 

Proof: We start with proving the bounds given in equations (1571) and d58l) . 
• Let us start with the bound on P. 



{A} M </ A(Y Ti ) given in equation (1571) . Since min xe ^ jye y W x (y) = A, the 
posterior probability of a m* e .M* at time r + 1 can not be smaller than A times its value at time r. Hence 
for 5 < 1/2, as a result definitions of Tj and „4j(Y Ti ) given in equations (|55T ) and (l56l ). we have 



M e A(Y Tl 



Y 



> A<5 



Vf* ey T **,ie{l,2,...J}. 
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Then as a result of the definition of P{4 i }(m,y t ) given in equation ( f39b we have, 



P {A} (m,y t )<P(m,y t )^^i 



Vm G jVt.y* G y T *,i G {1,2 



(140) 



For ^(y^) given in equation d56l ), if the decoded message IVIfy 11 ) is not in ^(y* 1 ) but m is in „4j(y t! ) then 
Wtf) + m*: 

Vfy'lM.fy'Ol^^Afy-)} < 1{m V ^ Vm e My* € y T ,i e {1,2, . . . ,*}. (141) 

Using equations (11401 ) and (11411 ) we get 

^^.y^jMty^Afy'i)} < H^^W 3 - Vm G A*,/ G 3^ T * ; i G {1,2,... ,1}. 

If we sum over all (m, y*) 's in M. x 3^ T * and use equations d3"7T ) and (|4TI ) we get, 



{A} 



M £ A(Y T 



< 



Ml 

A5 



Vie {1,2,...,*}. 



(142) 



Let us now prove the bound on P[M G „4j(Y Ti )] given in equation (|58T ). 
- If _4j(Y Ti ) / M., then at T, there is a m* with posterior probability (1 — 5) and all the messages m of 
the form m = (m l , rrij + i, . . . , m^) are excluded from A4. Consequently we have 



M G A(Y Ti ) A(Y Tl ) 7^ M 



< 5. 



(143) 



- If ^4j(Y Ti ) = M., then at T, there is no m ! with posterior probability (1 — 5) and Tj = T. Since IW 7^ 
implies that M / M we have 



M ^ M 



A(Y Ti ) = M 



> 5. 



(144) 



As a result of total probability formula for P 



M ^ M 



we have 



P e = P 



M / M 



A t (Y T ') = m\ p[a(Y t ') = M 



+ P 



A(Y T< ) 7^ All pU(Y Ti ) ± M 



> 5P 



A(Y Ti ) = M 



(145) 



If use the total probability formula for P[M G ^(Y 7 ')] together with equations (11431 ) and (1145b we get 



M G A(Y T ') 



= P 
< P 



{M g A(Y T< ),A(Y Ti ) ± M}] +p[{m g A(Y Ti ),A(Y Ti ) = M 
M G A(Y Ti ) A(Y T / M\ +p[A(Y Ti ) = M 



We apply Lemma|5]for (Ti , Ai),- ■ -,(Tfc, Ak) defined in equations (|55T ) and (156k use the bounds on Pua M G" Ai(Y Ti 
and P[M G Ai(Y T ')] given in (|57]) and J58T) . Then we can conclude that if P e + 5 + P e /# < 1/2 then 

e+i 

(l-e 3 )Ei < h+ Vfo) i=l,2,... ,£ (146) 



j=i+i 



where Rj, E», £3 and £5 are defined in Lemma 13 r, 's are defined in equation (|47"T ) of Lemma [5j and Vj's are defined 
as followO 



A E[T,]-E[T J _ 1 ] 
E[T1 



Vje{i,2,...,£ + i} 



(147) 



4 We use the convention To = and Tf- + i = T. 
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Depending on the values of u~ and r,- the bound in equation ( 11461 ) takes different values. However uj and r,- are 
not changing freely. As a result of equation (II 181) and the fact that I (M; Yt+i |YM < C we have 



rj < C 



j€ {1,2,..., (* + !)}. 



(148) 



In addition i/y's and r^'s are constrained by the definitions of Tj and Aj(Y Tj ) given in equations (l55T ) and (|56"1 ). At 
Tj with high probability one element of .A/P has a posterior probability (1 — 5). Below we use this fact to bound 
E[H(M|Y T3 )] from above. Then we turn this bound into a constraint on the values of za,'s and r^'s and use that 
constraint together with equations (11461 ). (11481 ) to bound Ej's from above. 

For all j in {1, 2, ... , £}, 1| M6 _4 (y t j)} * s a discrete random variable that is either zero or one; its conditional 
entropy given by 

' Y 1 ) . (149) 



(150) 



H ( 1 {M e ^(Y^)}l YT3 ) = ^P[MGA'(Y T 
Furthermore since ^-{MeA,(Y T ')} * s a function of Y Ti and M, the chain rule entropy implies that 



H(M|Y Ti ) = H(l {MeA(Y T, )} |Y T + E[H(M|Y T ',l {MeA(YTl)} ) 



Y 



Note that „4.j(Y Tl ) has at most elements and its complement, M. \ Ai(Y T '), has at most j^jj elements. We 
can bound the conditional entropy of the messages H(M|Y Ti , 1{mgA(y t »)}) as follows 

H(M|Y T ',l {MeA(Y T, )} ) < l {MeA(Y T !)} ln|A^| +t mAi( yr t)} ln^ u 



ln^ + l {MeA(YT!)} ln\M> 



Thus using equations (11491) . (11501) and (11511) we get 



H(M|Y Tl ) < h(p |~M G Aj(Y T ^] 



+ In 



\M\ 
\M*\ 



+ P 



M G Ai(Y Tz ) Y Ti ] ln|Af|. 



(151) 



(152) 



If we take the expectation of both sides of the inequality (1152b and use the concavity of the binary entropy function 
we get 



E 



H(M|Y T ') 



< h P 



M G Aj (Y 



+ In 



\M\ 



M G Ai(Y ') ln\M { 



Using the inequality given (1581 ) and the fact that binary entropy function is an increasing function on the interval 
[0, 1/2] we see that 



E 



H(M|Y T < h(P e + 5 + ^) +ln \^ + (p e + 6 + ^)]n\M 



(153) 



provided that P e + 5 + < 1/2. 

Note that as a result of Fano's inequality for E[H(M|Y T )] we have 

E[H(M|Y T )j < h(P e ) +P e \n\M\. (154) 
If we divide both sides of the inequalities (11531 ) and (11541 ) to E[T], we see that following bounds holds 

E[H(M|Y^)] < g 4 + R _ Ri + e 3 Y^ j=1 R »- i = l,2,...,£ (155a) 

e[h(M|Y t )1 < h(P e ) + P e R. (155b) 

Note that 

E[H(M|Y T ')] _ R V"^ 

Using equations (11551 ) and (11561 ) we get, 



1,2, 



+ D 



i =1,2,... ,j 



(156) 

(157a) 
(157b) 
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where r_,'s and ufs given in equation d4Tb and d 1471) respectively. 

Thus using equations d4Tb , dl46l ), d 147b . d 1 48b and d 157b we reach the following conclusion. For any variable 
length block code satisfying the hypothesis of the Lemma [7] and for any positive 5 such that P e + 5 + ^f- < \ 

(1 - e 3 )Ei -h< i !/,-.%) i=l,2,...,£ (158a) 

(1 - h) J2) =1 R J " ^ J2] =1 v i x i i=l,2,...,£ (158b) 

(1 - P e )R - ^ < Y!^ x W < 158c ) 
for some (i/^ +1 ,r^ +1 ) such that 

Ti€[0,C] i=l,2,...,{£+\) (159a) 

^i>0 i =1,2,..., (£ + 1) (159b) 

V. Ui = l (159c) 
* — / «=i 

We show below if the constraints given in equation (11581 ) is satisfied for some (z/^ +1 ,ri +1 ) satisfying (1 1 59b . 
constraints given in (l60l is satisfied for some (r)\) satisfying (loTb . 
One can confirm numerically that 

(l-e 3 )ln2>fc(e 3 ) Ve 3 € [0, ±] 

Recall that we have assumed that P e + 5 + < \, i.e., e 3 < \. Thus, 

(1 - e 3 )Ri - e 4 > 0. (160) 

Let rji, fi , z?2 an d r2 be 

= (l-g 3 )Rl-g4 

' x ri 
ri = ri 

V2 = ^2 + ^1 - ??1 

~ _ r2f2 + (^i-r/i)ri 
12 - 5 • 

Note that (771, U2, fi , £2, r^ 1 ) satisfies (I158bb . (I158cb and d 1 591 ) by construction. Furthermore as a result of 
concavity of J(-) we have, 

z/i J(ri) + V2J(j2) < mJ{n) + "2 J fa) ■ 

Thus (r/i,z?2, 1/ 3 +1 > fi ; ?2, r 3 +1 ) also satisfies (I158ab . 

For j > 2 we use ?j and rj to define 77^, z^+i and r J+1 as follows: 

Vj = (161a) 

77 J + i = U j + i + Z/j - T)j (161b) 

~ _ r J + i^j + i + (i?3-i?3)rj l\G.\r\ 

r i+i • ( ibic J 

Using the fact that (rf x ~ ,Vj, z^+^ff^rj+i) satisfies (1158b and (1159b and the concavity of J(-) we can show that 
(t7j , Vj + \ , £i 3+ , T j+2> a ^ so satisfies (1158b and (1159b . We repeat the iteration given in equation (1161b until we 
reach U£ + \ and r^ +1 and we let r/£ +1 = z^ +1 . 

Then we conclude that for any variable length block code satisfying the hypothesis of the Lemma [7] and for any 
positive 5 such that P e + 5 + ^f- < | 

(1 - e 3 )Ei - e 5 < V nj^) z=l,2,...,£ (162a) 

(1 - e 3 )Ri - e 4 l{ i= i} = nrji i=l,2,...,£ (162b) 

(e 3 - P e )R + K %^ P ' ] < re+iVe+i (162c) 
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for some (rji, ... , r/^ +1 , r l5 . . . , such that! 

Ti e [o, <7] 

r/i > 

E i^ = i- 
* — / «=i 

The Lemma |7] follows from the fact that </(•) < D. 



i=i,2,...,(e + i) 

i=l,2,...,(£ + l) 



(163a) 
(163b) 

(163c) 



//. Codes with Infinite Decoding Time on Channels with Positive Transition Probabilities 

In this section we consider variable length block codes on discrete memoryless channels with positive transition 
probabilities, i.e., mm x <=x ,yey W x (y) > 0, and derive lower bounds to the probabilities of various error events. 
These bounds, i.e., equations d 1 66b . ( 11721 ) and (11751 ), enable us to argue that Lemma [6] and Lemma [7] hold for 
variable length block codes with infinite expected decoding time, i.e., E[T] = oo. 

1) P e > 0: On discrete memoryless channel such that mm x <=x,yey W^x(y) = A the posterior probability of any 
message m G M. at time r is lower bounded as 

P[M = m|YH>( T ^) T ^. 



(164) 



Then conditioned on the event {T = r} the probability of erroneous decoding is lower bounded as 



M ^ M 



T = r 



> \M\-\ ( X V 

- \M\ U-aJ 



Note that since P[T < oo] = 1, the error probability of any variable length code satisfies 



p e = jr P [m ^ M 

Using equation (1164b and (11651) we get 



T=l 



PIT = t] . 



^ e - \M\ ^ 



X 
1-X 



(165) 



(166) 



Note that equation (11661 ) implies that for a variable length code with infinite expected decoding time not only the 
rate R but also the error exponent E is zero. 

2) If P e + < 1 then min m P e | m > : Note that since P[T < oo] = 1 and \M\ < oo, 

P[T < oo| M = m] = 1 Vm G M. 

For any variable length block code such that P e + < 1, let r* be 



r = min s t : max P[T > r| M = m 

meM 



1 < \Mtl_p \ 

1 - \M\ 



Since P[T < oo| M = m] = 1 for all m in M. and M. is finite, r* is finite. 
Note that for any r, m and m we have, 

P[Y T = y T | M = m] > ( T ^) r P[Y r = y T | M = m] 

Then using equation (11681 ) we get, 

P[{M = m,T < r*} 



(167) 



(168) 



^e|m > 



M = m 

(j^y ^p[|M = m,T<r* 



M = m 



M = m 



M = m 
- P[T > r* | M = m] 



(169) 



5 One can replace the inequality in equation d!62ct by equality because J(-) is a decreasing function. 
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Note that as a result of equation dl67| ) we have, 

P[T>r*|M = m]< (%i-P E 

Furthermore 

^ P [M = m 
Thus using equations (11691 ), (11701 ) and (11711) we get 



VmGM 



M = m 



> |M|(1-P e )-1 



mm 

meM 



(170) 
(171) 

(172) 



where r* is a finite integer defined in equation (11671) . 

3) For all i £ {1,2, .. . ,£}, P e (z) > 0: For a variable length block code with message set M. of the form 
Ai = M-i x M-2 x ... x M. k on a discrete memoryless channel such that min xe ^ ye y W x (y) = A the posterior 
probability of any element of M.i at time r is lower bounded as 



P[M l 



m Y7 ( T^A JM- 



Vm l g Af, Vi{l,2,... ,£}. 



Then conditioned on the event {T = r} the probability of decoding the i th sub-message erroneously is lower 
bounded as 



M i / IVT 

Since P[T < oo] = 1, P e (i) satisfies 

oo 

P e = P [ M ' + 

Using equation (11731 ) and (11741 ) we get 

> ^E 



> [Afhl ( A 
- \M'\ U-A 



T=l 



P[T = rl 



A 
1-A 



Vt{l, 2,. ..,*}. 



(173) 



(174) 



(175) 



Equation (1175b implies that for any variable length code with infinite expected decoding time on a DMC without 
any zero probability transition, not only the rates but also the error exponents of the sub-messages are zero. 

/. Proof of Theorem [7] 

Proof: In Section ITV-C I it is shown that for any rate R £ [0, C], error exponent E G [0, (1 — jj)D] there exists 
a reliable sequence Q such that Rq = R, Eq = E, E md) Q = E + (1 — ^)j( rz^jzj \ Thus as a result of the 
definition of E md (R, E) given in equation (fT3l) we have 



E md (R,E)>E+(l-§)j( T3 | 



-E/D 



(176) 



In Section IV-CI we have shown that any reliable sequence of codes Q with rate Rq and error exponent Eq satisfies 

E rad , Q <E Q + (l-^)j( T ^). 
Thus, using the fact that J(-) is a decreasing concave function we can conclude that 



Consequently as a result of the definition of E md (R, E) given in equation ([TBI we have 

E md (R,E)<E+ (!--§) 



(177) 
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Thus using equations (11761 ) and (11771 ) we can conclude that 



E md (R, E) - E + (1 - p ., ^ Y-WfD 



J 



R 



(178) 



In order to prove the concavity of E m d(R, E) in (R, E) pair, let (R , E ) and (R{,,E{,) be two pairs such that 



Ra G [0, C) 

Rb e [0, C] 
Then for any a G [0, 1] let R a and E Q be 



E a <(l-^)D 
E 6 <(1-^)L>. 



E„ 



aR a + (1 — a)R& 
aE a + (1 — a)E&. 



From equations d 179b and (11801 ) we have 

Ra G [0, C] 

Furthermore using the concavity of J(-) we get, 
aE m d(R a ,E a ) + (1 — a)E mc i(R;,, E5) 



E a <(!-%). 



(179a) 
(179b) 

(180a) 
(180b) 

(181) 



J 



1-E a /D 



= a [E a + (1 

= E a + a (1 - 

< E a + (1 - 
= E m d(R a , E a ). 
Thus E m d(R, E) is jointly concave in rate exponent pairs. 



+ (l-a) E 6 +(l 



D 



Rb 

1-E b /Z? 



t/ aR„+(l-a)Rb ' 
Z> / I l-E a /0 



Ik) 7/ Rb 
D ) J \l~E b /D 



(182) 



/. Proof of Theorem |2] 

Proof: In Section ITV-DI it is shown that for any positive integer £ a rates-exponents vector (R, E) is achievable 
if there exists a time sharing vector fj such that, 



Vi G {1,2,...,£} 

Vie {1,2,...,*} 
Vi G {1,2,... ,£} 



(183a) 

(183b) 
(183c) 

(183d) 



Ri < Cr/i 
% > 

Thus the existence of a time sharing vector fj satisfying (1183b is a sufficient condition for the achievablity of a 
rates-exponents vector (R, E). 

For any reliable code sequence Q whose message sets are of the form M.^ = A^p x x 
Lemma [7] with 5 = j^p- implies that there exists a sequence fj K such that 

'(l-?3. K )Rj,/ 



x 7W 



(1 - e 3 , K )E i)K - e 5 , K < (1 - V . i r/j K )D + V . 

(1 — e3 )K )Rj iK — e4,«;l{j=i} < C?7i,K 

> 

E i ^ 1 

z — '.7=1 



=i+l 



where R - >* t i.'° F - -^.(O" 2, 



1,2,.. 

1,2,.. 
1,2,... 



hje3, K )-lnXS 



(184a) 

(184b) 
(184c) 

(184d) 
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Note that as a result of equation dl84| ) all members of the sequence fj K are from a compact metric spaced Thus 
there exists a convergent subsequence, converging to a fj. Using equation (II 841 ). definitions of Rq^ and Eq^ given 
in Definition [TT] we can conclude that fj satisfies 

Eq,* < (1 - Yl, J=1 + T? j=i+1 ^ J {^f) Vie {1,2,..., £} (185a) 

Rq,i < Crji Vie {1,2,..., £} (185b) 

r]i > Vi e {1,2,... ,1} (185c) 

Y.) =i m<l- d85d) 

According to Definition [TT] describing the bit-wise UEP problem a rates-exponents vector (R, E) is achievable 
only if there exists a reliable code sequence Q such that (Rq,Eq) = (R, E). Consequently the existence of a time 
sharing vector satisfying ( 11831 ) is also a necessary condition for the achievablity of a rates-exponents vector (R, E) 
Thus we can conclude that a rates-exponents vector (R, E) is achievable if and only if there exists a fj satisfying 
(fT83l . 

In order to prove the convexity of region of achievable rates-exponents vectors, let (R a , E a ) and (R;,, Ef,) be two 
achievable rates-exponents vectors. Then there exist triples (R a ,E a ,f/ a ) and (Rb,E{,,?7 b ) satisfying (11831 ). 
For any a e [0, 1] let R Q , E a and fj a be 



R Q 


= aR a + (1 


— a)Rf, 


E Q 


= aE a + (1 


-a)E b 


'7« 


= Oifj a + (1 


- a)fj b . 



As J(-) is concave and the triples (R a ,E a ,77 a ) and (R&,E;,,7/ b ) satisfy the constraints given in (1183b . the triple 
(R Q ,E a ,?7 a ) also satisfies the constraints given in (1 1 83b - Consequently the rates-exponents vector (R a ,E a ) is 
achievable and the region of achievable rates-exponents vectors is convex. ■ 



References 

[1] P. Berlin, B. Nakiboglu, B. Rimoldi, and E. Telatar. A simple converse of Bumashev's reliability function. Information Theory, IEEE 

Transactions on, 55(7):3074-3080, July 2009. 
[2] S. Borade, B. Nakiboglu, and L. Zheng. Unequal error protection: An information-theoretic perspective. Information Theory; IEEE 

Transactions on, 55(12):551 1-5539, Dec. 2009. 
[3] M. V. Bumashev. Data transmission over a discrete channel with feedback, random transmission time. Problemy Perdachi Informatsii, 

12(4): 10-30, 1976. 

[4] F. Chung and L. Lu. Concentration inequalities and martingale inequalities: A survey. Internet Mathematics, 3( 1):79— 127, 2006. 

[5] I. Csiszar. Joint source-channel error exponent. Problems of Control and Information Theory, Vol. 9, Iss. 5:315-328, 1980. 

[6] S.K. Gorantla, B. Nakiboglu, T.R Coleman, and L. Zheng. Bit-wise unequal error protection for variable length blockcodes with 

feedback. In Information Theory Proceedings (ISIT), 2010 IEEE International Symposium on: 241-245, June 2010. 
[7] B. D. Kudryashov. On message transmission over a discrete channel with noiseless feedback. Problemy Perdachi Informatsii, 15(1):3-13, 

1973. 

[8] B. Nakiboglu and L. Zheng. Errors-and-erasures decoding for block codes with feedback. Information Theory, IEEE Transactions on, 

58(1):24 -49, Jan. 2012. 

[9] B. Nazer, Y. Shkel, and S. C. Draper. The AWGN red alert problem. |arXiv:l 102.441 l[cs.IT]| |DOI:10.1109/TIT.2012.2235120| 
[10] A. N. Shiriaev. Probability. Springer- Verlag Inc., New York, NY, USA, 1996. 

[11] D. Wang, V. Chandar, S. Y. Chung, and G.W. Wornell. On reliability functions for single-message unequal error protection. In 
Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on:2934 -2938, July 2012. 

[12] H. Yamamoto and K. Itoh. Asymptotic performance of a modified Schalkwijk-Barron scheme for channels with noiseless feedback. 
Information Theory, IEEE Transactions on, 25(6):729 - 733, Nov 1979. 



Let the metric be — u\\ — maxj \r)j — Vj\. 



